Open source GPU builds on RISC-V -

Open source GPU builds on RISC-V

A group of enthusiasts are proposing a new set of graphics instructions designed for 3D graphics and media processing. These new instructions are built on the RISC-V base vector instruction set. They will add support for new data types that are graphics specific as layered extensions in the spirit of the core RISC-V instruction set architecture (ISA). Vectors, transcendental math, pixel, and textures and Z/Frame buffer operations are supported. It can be a fused CPU-GPU ISA. The group is calling it the RV64X as instructions will be 64-bit long (32 bits will not be enough to support a robust ISA).

Why now?

The world has plenty of GPUs to choose from, why this? Because, says the group, commercial GPUs are less effective at meeting unusual needs such as dual-phase 3D frustum clipping, adaptable HPC (arbitrary bit depth FFTs), hardware SLAM. They believe collaboration provides flexible standards, reduces the 10 to 20 man-year effort otherwise needed, and will help with cross-verification to avoid mistakes.

The team says their motivation and goals are driven by the desire to create a small, area-efficient design with custom programmability and extensibility. It should offer low-cost IP ownership and development, and not compete with commercial offerings. It can be implemented in FPGA and ASIC targets and will be free and open source. The initial design will be targeted to low-power microcontrollers. It will be Khronos Vulkan-compliant, and over time support other APIs (OpenGL, DirectX and others).

The final hardware will be a RISC-V core with a GPU functional unit. To the programmer it will look like a single piece of hardware with 64-bit long instructions coded as scalar instructions. The programming model is an apparent SIMD, that is, the compiler generates SIMD from prefixed scalar opcodes. It will include variable-issue, predicated SIMD backend, vector front-end, precise exceptions, branch shadowing and much more. There won’t be any need for RPC/IPC calling mechanism to send 3D API calls to/from unused CPU memory space to GPU memory space and vice-versa, says the team. And it will be available as 16-bit fixed point (ideal for FPGAs), as well as 32-bit floating point (ASICs or FPGAs).

The design will employ the Vblock format (from the Libre GPU effort):

  • It is a bit-like VLIW (only not really)
  • A block of instructions is pre-fixed with register tags which give extra context to scalar instructions within the block
  • Sub-blocks include: vector length, swizzling, vector/width overrides and predication.
  • All this is added to scalar opcodes
  • There are no vector opcodes (and no need for any)
  • In the vector context, it goes like this: if a register is used by a scalar opcode, and the register is listed in the vector context, vector mode is activated
  • Activation results in a hardware-level for-loop issuing multiple contiguous scalar operations (instead of just one).
  • Implementers are free to implement the loop in any fashion they desire: SIMD, multi-issue, single-execution.

The design will employ scalars (8-, 16-, 24- and 32-bit fixed and floats), as well as transcendentals (sincos, atan, pow, exp, log, rcp, rsq, sqrt, etc.). The vectors (RV32-V) will support 2-4 element (8-, 16- or 32-bits/element) vector operations, along with specialized instructions for a general 3D graphics rendering pipeline for points, pixels, texels (essentially special vectors)

  • XYZW points (64- and 128-bit fixed and floats)
  • RGBA pixels (8-, 16-, 24- and 32-bit pixels)
  • UVW texels (8-, 16-bits per component)
  • Lights and materials (Ia, ka, Id, kd, Is, ks…)

Matrices will be 2 × 2, 3 × 3, and 4 × 4 matrices will be supported as a native data type along with memory structures to support them for attribute vectors and will be essentially represented in a 4 × 4 matrix.

Among the advantages of fused CPU-GPU ISA is the ability to implement a standard graphics pipeline in microcode, provide support for custom shaders and implement ray-tracing extensions. It also supports vectors for numerical simulations with 8-bit integer data types for AI and machine learning.

Custom rasterizers can be implemented such as splines, SubDiv surfaces and patches.

The design will be flexible enough that it can implement custom pipeline stages, custom geometry/pixel/frame buffer stages, custom tessellators and custom instancing operations.

RV64X block diagram

The RV64X reference implementation will include:

  • Instruction/Data SRAM Cache (32KB)
  • Microcode SRAM(8KB)
  • Dual Function Instruction Decoder
    • Hardwired implementing RV32V and X
    • Micro-coded Instruction Decoder for custom ISA
  • Quad Vector ALU (32 bits/ALU—fixed/float)
  • 136-bit Register Files (1K elements)
  • Special Function Unit
  • Texture Unit
  • Configurable local Frame Buffer

The design is meant to be scalable as indicated below.

RV64X’s scalable design

The RV64X design has several novel ideas including fused unified CPU-GPU ISA, configurable registers for custom data types, and user-defined SRAM based micro-code for application-defined custom hardware extensions for:

  • Custom rasterizer stages
  • Ray tracing
  • Machine learning
  • Computer vision

The same design serves both as a stand-alone graphics microcontroller or scalable shader unit, and data formats support FPGA-native or ASIC implementations.

Why is there a need for open graphics ?

The developers think most graphics processors cover the high end such as gaming, high-frequency trading, computer vision and machine learning. They believe the ecosystem lacks a scalable graphics core for more mainstream applications for things like kiosks, billboards, casino gaming, toys, robotics, appliances, wearables, industrial human-machine interfaces, infotainment and automotive gauge clusters. Meanwhile, specialty programming languages must be used to program GPU cores for OpenGL, OpenCL, CUDA, DirectCompute and DirectX.

A graphics extension for RISC-V would resolve the scalability and multi-language burdens enabling a higher level of use case innovation.

Next steps

This is a very early spec, still in development and subject to change based on stakeholder and industry input. The team will establish a discussion forum. An immediate goal isbuilding a sample implementation with instruction set simulator, an FPGA implementation using open-source IP and custom IP designed as open-source project. Demos and benchmarks are being designed. Developers interested in participating should contract Atif Zafar.

As for the Libre-RISC 3D GPU, the organization’s goal is to design a hybrid CPU, VPU, and GPU. It is not, as widely reported, a “dedicated exclusive GPU.” The option exists to create a stand-alone GPU product. Their primary goal is to design a complete all-in-one processor SoC that happens to include a Libre-licensed VPU and GPU.

What do we think?

The population of GPU suppliers is increasing. We now have over a dozen.

Apple Libre-RISC-V 3D GPU Qualcomm
AMD Nvidia RISC-V Graphics
Arm Intel Think-Silicon
DMP Jingjia Micro VeriSilcion
Imagination Technologies  

An application not listed as a potential user of a free, flexible, small GPU includes crypto-currency and mining.

If it is the goal of the RISC-V community to emulate the IP suppliers such as Arm and Imagination, then we can expect to see DSP, ISP and DP designs. There is at least one Open DSP proposal; perhaps it can be brought into the RISC-V community.

It will take at least two years before any hardware implementations emerge. One of the most logical candidates for adopting this design is Xilinx, which is now using Arm’s Mali in its Zynq design. We would also expect to see several implementations come out of China.

>> This article was originally published on our sister site, EE Times.

Jon Peddie, a pioneer in the graphics industry, is president of Jon Peddie Research.

For more Embedded, subscribe to Embedded’s weekly email newsletter.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.