AI chip accelerates image recognition - Embedded.com

AI chip accelerates image recognition

A proof-of-concept chip from French research institutes CEA-Leti and LIST, presented at VLSI Symposium 2020, incorporates a low-power IoT node and an AI accelerator and demonstrates ultra-fast wake up time with a 15,000X peak-to-idle power consumption reduction. The node delivers up to 1.3 tera operations per second per Watt (TOPS/W) or 36 GOPS for machine learning tasks.

The chip, named SamurAI, was tested in an occupancy detection system with off-the-shelf components including a PIR sensor, 224×224 pixel black and white camera, FeRAM and a low power radio. The daily average system power consumption was 105µW, with SamurAI consuming 26% of that budget. The system used the PIR sensor with a 5s interval during room occupation 8 hours per day, the camera at 1 frame per second and the radio 10x per day.

SamurAI System

SamurAI uses two on-chip sub-systems: a low-power clockless event-driven wake-up controller which can start up in 207 ns, and an on-demand subsystem comprising a RISC-V CPU with deep sleep mode plus PNeuro AI accelerator and cryptography accelerators.

This dual subsystem scheme enables a 15,000X peak-to-idle power ratio. The figure below shows the power consumption during different modes; idle mode consumes just 6.4 µW. With the CPU and AI accelerator running, the power consumption is 96 mW.

The chip is built on STMicro’s 28 nm fully depleted silicon on insulator (FD-SOI) process, and power figures are given without body biasing. The silicon is 4.5 mm2 and has 6 switchable power domains.


SamurAI power consumption measurements by power modes (the modes are L-R: idle, wake-up controller (WuC) only, wake-up controller and wake-up radio (WuR), wake-up controller and peripherals, and CPU running (Image: CEA-Leti)

AI accelerator

The chip’s AI accelerator, a design the team calls PNeuro, is a single instruction, multiple data (SIMD) programmable accelerator. It is comprised of 2 clusters of 32x 8-bit processing elements with 264kB multi-banked SRAM. It can perform up to 64 multiply-accumulates (MACs) per cycle. The PNeuro block can achieve 1.3 TOPS/W at 2.8 GOPS/0.48V. It can do up to 36 GOPS at 0.9V for 8-bit fully-connected neural network layers.

Using the PNeuro accelerator slashed the total power consumption of the system by a factor of 2.3 compared to using the controller RISC-V core for ML computation.


SamurAI’s two-cluster PNeuro accelerator with 64 processing elements total (Image: CEA-Leti)


PNeuro’s energy efficiency is 1.3 TOPS/W maximum and performance is 36 GOPS maximum (Image: CEA-Leti)

The design is intended for IoT applications which need sporadic compute power between long periods of “sleep”. Rather than connect to the cloud, if the node can process the AI workload itself, this can often be completed quicker and there is no privacy implication as the data is not shared outside the system. This may include applications such as person detection or scene identification using cameras or other sensors.

>> This article was originally published on our sister site, EE Times Europe.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.