Processor-in-memory chip speeds AI computations - Embedded.com

Processor-in-memory chip speeds AI computations

Imec and GlobalFoundries have demonstrated a processor-in-memory chip that can achieve energy efficiency up to 2900 TOPS/W, approximately two orders of magnitude above today’s commercial processor-in-memory chips. The chip uses an established idea, analog computing, implemented in SRAM in GlobalFoundries’ 22nm fully-depleted silicon-on-insulator (FD-SOI) process technology. Imec’s analog in-memory compute (AiMC) will be available to GlobalFoundries customers as a feature that can be implemented on the company’s 22FDX platform.


Imec’s AnIA test chip, seen here mounted on the PCB used for measurement and characterization, can achieve up to 2900 TOPS/W (Image: Imec)

Analog compute

Analog compute, or processor-in-memory, is an established technique that is already used in commercial AI accelerator chips from startups Mythic, Syntiant, Gyrfalcon and others.

Since a neural network model may have tens or hundreds of millions of weights, sending data back and forth between the memory and the processor is inefficient. Analog computing uses a memory array to store the weights and also perform multiply-accumulate (MAC) operations, so there is no memory-to-processor transfer needed. Each memristor element (perhaps a ReRAM cell) has its conductance programmed to an analog level which is proportional to the required weight.

Applying a voltage proportional to the input activation (via digital-to-analog conversion — on the left of the diagram below) means the current through each element is proportional to the product of the activation and the weight. The current through each vertical bit-line (vertical lines in the diagram below) is the sum of these activation-weight products, which can be fed through an analog-to-digital converter. This sum of the activation-weight products is critical to the calculation of neural network algorithms.


Analog computing uses an array of memristor cells to calculate matrix vector multiplication without having to send data between memory and processor (Image: Imec)

“In practice, many options are possible besides ReRAM — we can use MRAM, Flash, DRAM… the objective of this program is to understand which is best for the application and to optimize the options for each application domain,” explained Diederik Verkest, program director for machine learning at Imec.

Test chip

Imec has built a test chip, called analog inference accelerator (AnIA), based on GlobalFoundries’ 22nm FD-SOI process. AnIA’s 512k array of SRAM cells plus digital infrastructure including 1024 DACs and 512 ADCs takes up 4mm2. It can perform around half a million


Ioannis Papistas (Image: Imec)

computations per operation cycle based on 6-bit (plus sign bit) input activations, ternary weights (-1, 0, +1) and 6-bit outputs.

“We are able to produce the matrix vector multiplication output at different supply voltages, 0.8 and 0.6V,” said Ioannis Papistas from Imec’s machine learning group. “Operating at lower supply voltages without affecting the accuracy of the operation can significantly reduce the power consumption of operation, which is especially important for inference in energy constrained systems. This is an important feature of our design, enabled by the 22FDX process, that enables competitive inference on the edge.”

Imec showed accuracy results for object recognition inference on the CIFAR 10 dataset which dropped only one percentage point compared to a similarly quantised baseline. With a supply voltage of 0.8 V, AnIA’s energy efficiency is between 1050 and 1500 TOPS/W at 23.5 TOPS. For 0.6 V supply voltage, AnIA achieved 5.8 TOPS at around 1800-2900 TOPS/W.


Energy efficiency for various AI accelerators compared to Imec’s AnIA test chip (Click to enlarge) (Image: Imec)

Mainstream innovation

“The innovation [Imec presented] is going to become mainstream,” said Hiren Majmudar, VP and GM of GlobalFoundries’ computing business unit. “We are seeing partners, customers of GlobalFoundries who are in the post-production stage with validated silicon… we expect that analog compute-based silicon will be hitting production around the end of this year or early next year. In terms of the mass market deployment, we anticipate analog compute to start getting into mass market certainly no later than 2022. But it could potentially happen sooner than that.”


Diederik Verkest (Image: Imec)

GlobalFoundries is working to include Imec’s AiMC technology as a feature that can be implemented on the 22 FDX platform to enable energy-efficient AI accelerators. The FD-SOI process is designed for low power consumption, with the ability to operate down to 0.5 V with 1 pico amp per micron for ultralow standby leakage. 22FDX with the new AiMC feature is in development at GlobalFoundries’ 300mm production line at Fab 1 in Dresden, Germany.

As for Imec, the machine learning program will continue. The group’s ambition is to reach 10,000 TOPS/W (10 TOPS below 100mW) for always-on smart sensors and consumer wearables, said Verkest.

“In our ML program, our next steps are to reduce the size of these compute cells and to start looking at emerging memory devices as a next generation implementation for this principle,” he said.

>> This article was originally published on our sister site, EE Times.

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.