BrainChip, the neuromorphic computing IP vendor, launched two development kits for its Akida neuromorphic processor during the recent Linley Fall Processor Conference. Both kits feature the company’s Akida neuromorphic SoC: an x86 Shuttle PC development kit and an Arm-based Raspberry Pi kit. BrainChip is offering the tools to developers working with its spiking neural network processor in hopes of licensing its IP. Akida silicon is also available.
BrainChip’s neuromorphic technologies enables ultra-low power AI for analyzing data in edge systems where extremely low-power, real-time processing of sensor data is sought. The company has developed a neural processing unit (NPU) designed to process spiking neural networks (SNNs), a brain-inspired neural network that differs from mainstream deep-learning approaches. Like the brain, an SNN relies on “spikes” that convey information spatially and temporally. That is, the brain recognizes both sequences and the timing of spikes. Referred to as the “event domain,” spikes typically result from changes in sensor data (for example, changes in pixel colors in an event-based camera).
Along with SNNs, BrainChip’s NPUs can also process convolutional neural networks (CNNs) like those typically used in computer vision and keyword-spotting algorithms at lower power than other edge implementations. This is done by converting CNNs to SNNs and running inference in the event domain. The approach also allows for on-chip learning at the edge, a quality of SNNs that would be extended to converted CNNs.
BrainChip’s development boards are available for x86 shuttle PC or Raspberry Pi. (Source: BrainChip)
Akida “is ready for tomorrow’s neuromorphic technology, but it solves today’s problem of making neural network inference possible on edge and IoT devices,” Anil Mankar, BrainChip co-founder and chief development officer, told EE Times.
Conversion from CNN to the event domain is performed by BrainChip’s software tool flow, MetaTF. Data can be converted to spikes, and trained models can be converted to run on BrainChip’s NPU.
“Our runtime software is taking the fear out of ‘What is an SNN?’ and ‘What is the event domain?’,” Mankar said. “We do everything to hide that.
“People who are familiar with the TensorFlow or Keras API… can take the application they are running on [other hardware], same network, same dataset, with our quantization-aware training, and run it on our hardware and measure the power themselves and see what the accuracy will be.”
CNNs are particularly good at extracting features from large data sets, Mankar explained, and conversion to the event domain preserves that benefit. The convolution operation is achieved in the event domain for most layers, but the final layer is replaced. Replacing it with a layer that recognizes incoming spikes gives the otherwise ordinary CNN the ability to learn via spike-timing dependent plasticity at the edge, eliminating re-training in the cloud.
While native SNNs (those written from scratch for the event domain) can use one-bit precision, converted CNNs require 1-, 2- or 4-bit spikes. BrainChip’s quantization tool helps designers decide how aggressively to quantize on a layer-by-layer basis. Brainchip has quantized MobileNet V1 for 10-object classification with prediction accuracy of 93.1 percent after quantization to 4 bits.
A byproduct of converting to the event domain is significant power savings due to sparsity. Non-zero activation-map values are represented as 1- to 4-bit events, and the NPU only performs computation on events rather than the entire activation map.
Developers “can look at the weights, and see non-zero weights, and try to avoid doing multiplication by zero weights,” Mankar said. “But that means you have to know where the zeros are, and there’s computation required” for those operations.
For a typical CNN, the activation map would change with every video frame since ReLU functions are centered around zero–typically half the activations will be zero. By not making spikes from these zeros, computation in the event domain is limited to non-zero activations. Converting CNNs to run in the event domain can leverage sparsity, rapidly reducing the number of MAC operations required for inference, and therefore, power consumed.
Among the functions that can be converted to the event domain include convolution, point-wise convolution, depth-wise convolution, max pooling and global average pooling.
MAC operations required for object classification inference (dark blue is CNN in the non-event domain, light blue is event domain/Akida, green is event domain with further activity regularization). (Source: BrainChip)
In one example, a keyword-spotting model running on the Akida development board after 4-bit quantization consumed as little as 37 µJ per inference (or 27,336 inferences per second per Watt). Prediction accuracy was 91.3 percent, and the chip was slowed to 5 MHz to achieve the observed performance. (see graph below).
BrainChip’s NPU IP and the Akida chip are agnostic to network type, and can be used alongside most sensors. The same hardware can process image and audio data using CNN conversion, or BrainChip’s SNNs for olfactory, gustatory and vibration/tactile sensing.
NPUs are clustered into nodes of four, which communicate via a mesh network. Each NPU includes processing and 100 kB of local SRAM for parameters, activations and internal event buffers. CNN or SNN network layers are assigned to a combination of multiple NPUs, passing events between layers without CPU support. (While networks other than CNNs can be converted to the event domain, Mankar said they require a CPU to run on Akida.)
BrainChip’s NPU IP can be configured for up to 20 nodes, and larger networks can be run in multiple passes on designs with fewer nodes.
Nodes of four BrainChip NPUs are connected by a mesh network. (Source: BrainChip)
A BrainChip video showed an Akida chip deployed in a vehicle in-cabin system, with one chip used to detect a driver, recognize the driver’s face and identify their voice simultaneously. Keyword spotting required 600 µW, facial recognition needed 22 mW and the visual wake-word inference used to detect the driver was 6-8 mW.
Low-power consumption for such automotive platforms offers flexibility to automakers in other areas, said Rob Telson, BrainChips’ vice president for worldwide sales, adding that the Akida chip is based on Taiwan Semiconductor Manufacturing Co.’s 28-nm process technology. IP customers can go to finer process nodes to save more power, Telson added.
Meanwhile, facial recognition systems can learn new faces on-chip, without shifting to the cloud. A smart doorbell could, for example, identify a person’s face locally from one-shot learning. Provided the last layer of the networks is assigned a sufficient number of neurons, the total number of faces recognized could be increased from 10 to more than 50, Mankar noted.
Early access customers
BrainChip has 55 employees spread across its Aliso Viejo, Calif., headquarters and design offices in Toulouse, France, Hyderabad, India, and Perth, Australia. The company has 14 patents and is publicly traded on the Australian stock exchange and the U.S. over-the-counter exchange.
About 15 early-access customers including NASA, said Telson. Others include automotive, military, aerospace, medical (olfactory Covid-19 detection) and consumer electronics companies. BrainChip is targeting consumer applications such as smart health, smart city, smart home and smart transportation.
Another early customer is microcontroller specialist Renesas, which has licensed 2-node Akida NPU IP to be integrated with a future MCU aimed at sensor data analysis in IoT deployments, according to Telson.
>> This article was originally published on our sister site, EE Times.
- Startup packs 1000 RISC-V cores into AI accelerator chip
- AI chip targets low-power edge devices
- AI-enabled SoCs handle multiple video streams
- Building security into an AI SoC using CPU features with extensions
- Microcontroller architectures evolve for AI
For more Embedded, subscribe to Embedded’s weekly email newsletter.