Achronix Semiconductor Corp. is rolling out Speedcore Gen4m, its new generation of embedded FPGA IP designed as an AI accelerator to be built into SoCs.
Seeking more efficient data acceleration, Achronix’ Speedcore Gen4 targets a broader set of applications including computing, networking and storage systems for packet processing and interface protocol bridging/switching. But the Gen4’s shiniest feature, added to its architecture by Achronix, are Machine Learning Processor (MLP) blocks.
By adding MLP to the library of available blocks, Achronix claims that Speedcore Gen 4 — designed for 7nm process technology — “delivers 300% higher system performance for artificial intelligence and machine learning applications,” compared to Achronix’ own 16nm Speedcore.
“MLP blocks are highly flexible, compute engines tightly coupled with embedded memories to give the highest performance/watt and lowest cost solution for AI/ML applications,” Achronix said.
These days, there is hardly a single chip company CEO not coveting the AI market.
Robert Blake, president and CEO of Achronix, however, told EE Times that he became aware almost 20 years ago of the AI potential of FPGAs. When he first met Anna Patterson, at that time working at Google on search engine algorithms, Blake said it dawned on him that massive parallelism would be the key to functions like page ranking. “I remember thinking that something like FPGAs have a significant upside.”
With Patterson focused on software then, and Blake on hardware, “we could not cross the divide at that time. But I had the recognition of AI, early on,” he said.
Of course, Blake is not saying that FPGA is the only solution for AI/Machine Learning. Acknowledging a spectrum of solutions for AI accelerators — ranging from CPUs, GPUs to FPGAs and ASICs — Blake said, “This market is growing so fast, so all of these different solutions will see an upside.”
Compared to CPUs that offer maximum flexibility, ASIC’s equal and opposite strength is efficiency. “But the question with ASIC is, can you retain flexibility to do different workloads?” Blake asked. Among the challenges of the next five to ten years, he noted, are “workloads we’d like to accelerate and analytics we’d like to do on these massive data sets we are collecting.”
Blake observed, “When companies deploy hardware acceleration, they are going to have to choose carefully between how much efficiency they get and how much flexibility they retain.” In his opinion, Achronix, with FPGA, is “in a spot where I think we can get very good efficiency, but we retain flexibility to do different things.”
Blake is also bullish about FPGA’s new growth prospect in AI/ML applications.
He estimated the current FPGA market in total at $5.5 billion a year. “It’s growing at a high single digital every year.” In contrast, Achronix predicts the FPGA market in AI/ML applications to grow more than 50% a year. This is based on the company’s own estimate combined with a variety of market research firms’ forecasts and those by banks.
“This is too important a market for us to ignore,” Blake said.
FPGA’s specific AI advantages
Blake told us he is fully aware of “people putting FPGA in a bucket of ‘prototype-only,’ or ‘connectivity-only.’” He added, “But if you see FPGAs in the new space like AI, FPGAs are another tool in a tool box, offering another programmable engine alongside with CPUs and GPUs.”
You can build an arbitrary width data path engine.
You can give whatever functionality you like.
You can replicate it 100 times or 1000 times.
Or you can tear it down and do something completely different a few cycles later.
FPGA is “a big path engine,” Blake said. “These types of circuits are very good at processing those giga streams, and massive parallelism. They can do it very fast and at a very good power efficiency.”
GPU vs FPGA
If so, why did FPGA guys let GPU companies like Nvidia dominate the AI market for so long? Whether Xilinx, Intel’s Cyclone FPGA, or Flex Logix Technologies, FPGA vendors have only recently shown up in the AI acceleration discussion.
Blake explained that because of the graphics capability of GPUs, “I think they were readily available at a very low-cost point. It was very easy for somebody to buy a graphics card and start doing AI training on it.”
He noted, “A lot of the work in these GPUs, because they are aimed at doing graphics, do floating point arithmetic. But floating point is quite expensive from transistor count, also from power consumption count.”
In Blake’s opinion, although early development on analysis and training of these networks was done on GPUs, pressure began to build for reductions in cost and power drain. This triggered interest in “FPGA architecture where you don’t have to use high-precision floating point,” Blake noted. “You can use a smaller form factor arithmetic, which leads to cost and area savings. This has started to become also important, particularly in inference pieces of neural networks.”