In this Product How-To article , Fujitsu’s Waqar Saleem compares 2D and 3D graphics techniques in embedded designs and then describes an approach that combines the benefits of both techniques, using the company’s IRIS fast pixel engine- based MB86R1x and MB9EF126 CPUs to illustrate a typical 2D/3D configuration.
Sophisticated 3D graphics capabilities are starting to appear in embedded devices such as cell phones and tablets. However, 2D graphics are still needed in applications that either use 2D graphics only or use both 2D and 3D capabilities.
For example, POS terminals, industrial devices, and mid- to low-end automotive graphics applications only need 2D graphics, while some high-end automotive navigation systems may need both 2D and 3D for GUI implementation. In these hybrid applications, effective use of 2D can allow the 3D engine to focus on the most graphics-intensive operations.
Given the importance of two-dimensional graphics in embedded apps, it’s important to understand two common approaches to 2D graphics: raster and vector graphics. This article compares the two techniques, then describes an approach to embedded graphics that combines the benefits of both techniques. The embedded application used as an example is for automotive designs.
To begin, let’s assume familiarity with embedded systems and terms such as Flash ROM, CPU RAM, and VRAM. The discussion is based on typical system configurations such as the one shown in Figure 1a above and Figure 1b below.
Figure 1: (b) Single-chip Embedded Architecture
Raster Graphics Engines
Raster graphics rely on pre-rendered images or bitmaps for operation. If it’s a multi-chip application, the bitmaps will typically be stored in the off-chip CPU Flash ROM. In a true single-chip solution, they will reside in a dedicated external Flash chip, which can be connected via a serial or parallel bus to the graphics chip. Raster engines use Block Transfer (BLT) operation to transfer the bitmaps to the graphics engine. BLT operation is similar to memcopy operation.
There are two common architectures for raster graphics engines, as shown in Figure 2 below. One uses the line-buffer mechanism. This technique does not need a frame buffer for pre-composing scenes. It fetches bitmaps (sometimes called sprites) one line at a time. The advantage is a smaller or no VRAM requirement, but the engine can quickly run out of steam if the application demands fairly dynamic graphics content.
The other architecture uses a frame buffer in VRAM. All bitmaps that are supposed to show up on the display are BLTed from Flash memory into VRAM ahead of time. The graphics engine will combine the bitmaps in the desired order to create the final scene, which will then be sent to the display. The choice of architecture depends on the application. This type of raster graphics architecture is useful if the application needs a really dynamic visual.
Vector Graphics Engines
Vector graphics engines are much more powerful than raster engines. In addition to handling pre-rendered bitmaps, vector engines can generate dynamic graphics elements on the fly, and can draw a 2D object using paths/primitives based on an input of mathematical equations.
The engine can then give the object a real-life look using color fills and texture maps. Such engines can also translate, rotate, and scale the 2D objects using matrix multiplication.
The most popular vector graphics engine in embedded systems is OpenVG 1.1, which is based on an open standard defined by Khronos. Every semiconductor company will have its own implementation of this standard. The power, performance, and efficiency depend on the design. Figure 3 below shows the OpenVG pipeline concept from Khronos.
Click on image to enlarge.
Figure 3: OpenVG Pipeline Concept [Image based on graphic from http://www.khronos.org/openvg/]
Comparison of Raster and Vector Graphics Engines
Raster and vector engines have their pros and cons. Raster engines are simpler and less expensive to design and implement. They rely on pre-rendered graphics and don’t have to do dynamic computations during operation. They can be designed without much difficulty using BLT-ers or memory-fetch units that transport bitmaps from permanent storage (Flash) to either a line buffer or VRAM.
However, memory storage and bandwidth requirements for these engines can become excessive if the application requires long animated sequences or sophisticated effects, such as image scaling or rotation. Another drawback of raster engines is that image quality can be adversely affected if they need to scale up the image.
On the other hand, vector engines are excellent at handling dynamic graphics. They can create 2D objects from scratch using paths/ primitives, color fills, and texture maps, based on a set of mathematical equations. Another benefit is the engine’s low memory size requirement.
These engines do not need bitmap sequences to show animation. They can translate, scale, and rotate 2D objects without consuming much memory. They can also handle fonts well. However, this comes at the price of a complicated design. The vector engine or IP cab costs a lot as well, and these engines still need significant memory to show sophisticated graphics at a high frame rate.
An alternative to these extreme approaches is an architecture based on the rendering process, but with the ability to process the pixels being rendered to scale, rotate, and perform other operations.
This helps get rid of both the extensive bandwidth requirement of a pure raster (BLT) engine and the intensive mathematical calculations required by a pure vector engine. This balancing act simplifies the design complexity and cost and, at the same time, keeps the required graphics memory requirement within reasonable bounds.
To be effective, this architecture must be very flexible. It must quickly fetch the bitmap to be processed from the source, and then rapidly process the incoming pixels to perform operations such as rotations, scaling, and alpha blending.
The architecture also needs to be versatile enough to handle different color formats, and must be able to convert data of one type into another as needed. It has to either store the processed pixels in another location in memory or send them to the display pipeline. The architecture also has to drive the display panel at the required timing requirements.
Iris Provides a Balanced Approach
We have balanced the two approaches in the Iris graphics engine. This IP combines the best of the two worlds: raster and vector graphics. Since the engine is based inherently on raster graphics, it is simpler than vector engines and has significantly less IP design and development cost.
At the same time, through its bitmap-rotation and scaling capability, Iris requires much less memory and bandwidth than conventional 2D raster engines. The rotation and scaling capability allows it to show sophisticated and dynamic effects, such as needle draw or album cover turnstile, at a high frame rate.
Iris is a fast pixel engine that can rotate and scale pre-rendered graphics data, and handle compressed bitmaps. A block diagram is shown in Figure 4 below.
Clickon image to enlarge.
All units in the pipeline work in synchronization with each other, courtesy of a signal that flows from display to start of the pipeline. Consequently, there is no need to flush the pipeline.
Here is a brief description of the individual blocks in the Iris pixel engine and how they combine the best of the raster and vector-graphics techniques.
1. Pixel Engine. As Figure 4 below shows, this block is the heart of the Iris engine. This flexible pixel-processing unit can have two instantiations of pixel pipelines, one for blit and the other for display.
The engine, which can be configured through an AHB interface, offers interrupt signals. Its main sub-units are its fetch, store, and core pixel-processing units. Their functions, in respective order, are to fetch pixel data from memory, store, and process them as needed. The pixel-processing units will be discussed in more detail next.
2. Fetch Unit . This is a key part of the pixel engine. As the name indicates, the fetch unit gets pixel data from ROM or RAM storage units at high speed. It allows an image size as high as 1024×1024.
Clickon image to enlarge.
Figure 4: Iris Block Diagram
There are three different types of this unit, depending on the action it performs on the graphics data while fetching:
* ROT : Rotates the incoming bitmap
* RLD : Decompresses the run length encoded bitmap
* Light: Alpha blends the incoming pixel data
The fetch unit will start working after it receives a kick signal from the display side. It also performs color conversion on the graphics data.
Optionally, it can also perform bilinear filtering, which is useful when the image is being rotated. Right before the graphics data is sent out of this block, the unit performs multiplications with constant color/alpha or an alpha pre-multiplication.
3. CLUT. The Color LookUp Table can be used to either look up or index colors. In the first role, the table can be used to neutralize non-linearity of color transmission or adapt the individual characteristics of the display panel.
In the second role, the table contains reduced-length color code that points to the raw color values. This can help reduce memory consumption.
The block consists of a RAM with 256 entries. Each entry is 10-bits wide for each color component. CLUT is fully software programmable.4. Matrix. This unit performs linear color correction or conversion (e.g., YUV to RGB conversion). The unit allows for full programming of the matrix and offset coefficients and supports 3×3 matrix size. The unit can operate in one of the following three modes:
* Neutral mode : The unit does not perform any operation on the incoming data.
* Matrix mode: The incoming pixel data goes through a matrix conversion, as discussed previously.
* Pre-multiplication mode: The incoming pixel value (RGB) is pre-multiplied with the incoming alpha value.
5. Scalar. As mentioned earlier, a key functionality of Iris is the ability to upscale or downscale bitmaps through the scalar unit. This brings it closer to vector graphics engines. The unit is comprised of two components:
* for horizontal scaling
* for vertical scaling
The scalar can handle bitmap sizes from 1×1 up to 1024×1024. It also supports point and bilinear filtering to allow fine image quality.
6. ROP. The ROP unit performs the standard raster operations (e.g., AND, OR, NOR) on pixel values. This unit can take up to three input channels of pixel data.
7. BitBlend. This unit performs various blending operations on two input pixel sources. It performs both OpenVG 1.0 and OpenGL 2.0 blending modes.
8. LayerBlend. This unit is used to blend or merge two video layers. It can either perform an alpha-blending operation between the two layers or can just operate in a transparency mode with one layer getting higher priority over the other.
9. Store. This unit stores pixel data received from the pixel engine into memory. Image sizes up to 1024×1024 are supported.
10. ExtDst. This is an output interface of the pixel engine for the pixel stream. It is used to pipe the stream to the display controller so that the content can be shown on the LCD screen.
11. Display Controller. This unit generates the required control signals for operating a display in RGB mode using HSYNC and VSYNC signals as timing references. The pixel data from the pixel engine is transmitted to the display controller through a FIFO interface. The unit supports display clocks up to 40MHz.
The display controller has a dithering unit that can help smooth the appearance of image data on a low-resolution display panel. Both spatial and temporal dithering modes are supported.
12. Timing Controller. This unit, usually referred to as TCON, generates control and data signals to directly communicate with the column and row drivers of a display panel. The freely programmable waveform of the generated timing control signals allows emulating almost every timing controller IC. The unit supports both single-ended TTL and differential RSDS signaling modes.
Iris has a reconfigurable architecture. All the blocks can be flexibly organized to create a pipeline according to the application needs. For example, the number and type of fetch units can be changed.
Clickon image to enlarge.
The pixel data can either be stored in memory through the store unit or sent to the display using extdst. In between the two operations, the pipeline can have many different combinations of pixel-processing units. Two possible configurations are shown in Figure 5 above.
Implementing 2D graphics has traditionally required design engineers to pick between the pros and cons of raster verses vector based engines. The ideal solution would be an approach that balanced the best of both.
The Iris graphic engine does just that. This powerful 2D raster engine combines the best qualities of raster and vector architectures in a flexible manner.
Because of its raster architecture, the IP design and development cost is low. At the same time, its memory requirement is low because of the capability to support scaling and rotation functions, which typically have been a hallmark of vector engines.
The scalable engine can be used in either a 2D-only or a 2D/3D device that has an OpenGL-ES 2.0 type high-end graphics engine. Fujitsu's MB9EF126 “Calypso” graphics device is an example of an Iris 2D-only implementation. This single-chip SoC has an ARM Cortex R4 CPU. Iris allows Calypso to sport powerful 2D graphics while keeping the graphics memory footprint low.
The MB86R1x “Emerald” family is an example of a 2D/3D configuration. This SoC has an ARM Cortex-A9 CPU and OpenGL-ES 2.0 engine with a fully programmable shader. The presence of Iris in the system allows the 3D engine to focus fully on the 3D aspects of the graphics while Iris takes care of the 2D side, such as THE HMI components.
The result is that organizations can more quickly and cost-effectively meet the ever-increasing need for 2D and 3D graphics in embedded applications.
Waqar Saleem is a senior applications engineer with Fujitsu Semiconductor America , based in Detroit, Michigan. He has more than a dozen years of design and applications experience with Fujitsu Semiconductor, and holds engineering degrees from San Jose State Univ. and the University of Engineering and Technology in Lahore, Pakistan.