eFPGA custom blocks supercharge data acceleration systems

October 19, 2017

Max The Magnificent-October 19, 2017

I've long been interested in the technology coming out of Achronix. They first caught my interest circa 2004/2005 with their asynchronous FPGA fabric. Eventually, this managed to reach speeds equivalent to a synchronous FPGA being clocked at 2 GHz (if there were such a beast), but it was only applicable to a limited number of algorithmic and dataflow applications.

In 2013, Achronix launched a family of high-performance, high-density standalone FPGAs called Speedster, which were focused on targeted applications. The Achronix product portfolio was augmented in 2016 by Speedcore, which is a high-performance, embedded FPGA (eFPGA).

FPGAs are ideal for accelerating data-intensive artificial intelligence (AI) / machine learning (ML), 5G wireless, automotive ADAS, datacenter, and networking applications. One solution is to use a standalone FPGA, like a Speedster, in conjunction with a standalone processor or a System-on-Chip (SoC). However, much higher performance, coupled with significantly lower power consumption, can be achieved by embedding the core FPGA fabric -- the eFPGA -- in the SoC itself.

(Source: Achronix)

In the case of Speedcore, SoC developers have access to a library of pre-defined blocks for Logic, DSP, BRAM, and LRAM.

Speedcore blocks (Source: Achronix)

These blocks are presented in columns. Furthermore, the developers can specify the desired "height" (number of blocks in a column), "width" (number of columns), and "mix" (types of columns). Some projects may benefit from more logic and less DSP, for example, while others may require more DSP and memory.

Developers can specify the "mix" of the Speedcore fabric (Source: Achronix)

Now, those who live in the software side of the world are used to profiling their code to identify any bottleneck functions, which they then fine-tune to achieve the highest possible performance. Well, Achronix now offers the same ability for those of us who hang out on the hardware side of the fence.

For those developers who demand the highest possible performance, Achronix now provides the capability to create Speedcore custom blocks. These custom blocks are defined collaboratively by Achronix with its customers through a detailed architecture analysis of acceleration workloads. Repeated functions that are performance and/or area bottlenecks provide ideal candidates to be hardened into Speedcore custom blocks.

As one example, consider a YOLO ("you only look once") object recognition function used in an advanced, real-time object detection and identification system. In this case, creating and deploying Speedcore custom blocks that optimized DSP and memory blocks for matrix multiplication resulted in significant die size reduction.

Die size reduction of an AI convolutional network (Source: Achronix)

Achronix ACE design tools fully support Speedcore custom blocks from design capture to bitstream generation and system debug in the same way as memories and DSP blocks. Achronix creates a unique GUI for each Speedcore custom block that manages all configuration rules. ACE contains full timing details for all configurations of the Speedcore custom blocks, which allows it to complete timing-based place-and-route for designs. Last but certainly not least, developers can also use ACE’s powerful SnapShot embedded logic analyzer to create complex triggers and show run-time signals within Speedcore devices.

For more information on Speedster, Speedcore, and Speedcore custom blocks, please visit the Achronix website.

Loading comments...