New core and API seek to meld multiple AI engines -

New core and API seek to meld multiple AI engines

BRUSSELS, Belgium — A serious number of AI chip startups, many gunning for the automotive market, have popped up in the last few years, but there has been a counterpoint. OEMs and Tier Ones are reportedly eager to design home-grown AI chips — much like Tesla’s groundbreaking development of its own “full self-driving (FSD) computer” chips.

If the latter case is the trend, where does it leave IP core licensors like Ceva, Inc.? And what should they do next?

First and foremost, they must increase the performance of their licensable IP cores designed for AI architecture. They need, above all, to make their neural network cores even more irresistible to SoC designers.

Perhaps even more important is to ensure that their IP cores do not get designed out. They must avoid the risk of their IP cores getting replaced by nouvelle AI processors developed by startups or car OEMs.

In what might be viewed as a preemptive strike, Ceva came here with two announcements demonstrating it's achieving those objectives.

At the AutoSens, opening this week in Brussels, Ceva rolled out its second-generation edge AI processor architecture for deep neural network inferencing. Called “NeuPro-S,” the new AI architecture includes “a number of system-aware enhancements that deliver significant performance improvements,” claimed Ceva.

In parallel, Ceva unveiled what it calls its CDNN (Ceva Deep Neural Network) Invite API, a deep neural network compiler technology designed to support not only Ceva’s own NeuPro cores but also third-party neural network engines in a single, unified neural network architecture.

As neural networks continue to advance, Yair Siegel , director of segment marketing at Ceva, told EE Times that car OEMs and Tier Ones want to “see a flexible AI architecture” that allows third-party neural network solutions for specific use cases, in addition to NeuPro cores, under one roof.

Click here for larger imageConcept for CDNN Invite (Source: Ceva)
Concept for CDNN Invite (Source: Ceva)

Mike Demler, senior analyst at The Linley Group, described the CDNN-Invite as “an interface that allows mapping the customer’s AI accelerator into the same computational graph alongside NeuPro, so that it can be run by the same host controller.” Demler sees Ceva’s advantage in its ability to build on an established foundation. He called Ceva’s “CDNN-Invite” feature “novel.”

Ceva claimed that the CDNN-Invite would create a much needed “open environment” for AI architecture, as opposed to “Nvidia whose architecture is completely closed.”

Demler, however, questioned if that is entirely true, if you look at system-level solutions. He pointed out, “Actually, if you’re using Nvidia’s GPU as an accelerator in a heterogeneous system, the software framework is completely open to plug in other engines. Audi’s zFAS system, for example, uses both EyeQ and [Nvidia’s] Tegra processors. It’s not a problem.

Jeff VanWashenova, director of the automotive market segment at Ceva, responded, “The difference here with CDNN-Invite API is we are allowing other neural network engines to be on the same silicon” with NeuPro.

Demler acknowledged that Ceva is “making it easier for customers that already use their IP to extend it,” by allowing third-party accelerators to be inside a single neural network engine.


Ceva’s NeuPro-S consists of a NeuPro-S engine and Ceva-XM, a fully programmable vector DSP.

The strength of NeuPro-S is that “the fully programmable CEVA-XM6 vision DSP incorporated in the NeuPro-S architecture facilitates simultaneous processing of imaging, computer vision and general DSP workloads in addition to AI runtime processing.” This unified imaging, computer vision and AI combo is the key, explained Siegel.

As more people dabble with neural networks, they are beginning to realize that not all imaging/visual tasks should be left to AI. Imaging tasks such as wide-angles and SLAM (simultaneous localization and mapping), for example, are better handled by traditional computer-vision algorithms, explained Ceva’s Siegel. After images are cleaned up, then, they are handed over to an AI engine. AI is better suited to perform functions like segmentation, detection and object classification.

Click here for larger imageNeuPro-S, Single Core System Diagram (Source: Ceva)
NeuPro-S, Single Core System Diagram (Source: Ceva)

But the biggest improvements in NeuPro-S come from its “memory optimized design,” noted Ceva’s Siegel. By extending support for multi-level memory systems, NeuPro-S “reduces costly transfers with external SDRAM,” while it provides “multiple weight compression options.”

More specifically, weight compression is achieved by retraining and compression via CDNN (offline) and decompression via the NeuPro-S engine (real-time). Further, by enabling seamless use of L2 memory types, internal memory improves. It also features robust DMA and the local memory system by optimizing parallel processing and memory fetching to minimize overheads. This means that NeuPro-S does not draw power from a main computer.

All these memory optimization design results in “on average 50% higher performance, 40% lower memory bandwidth and 30% lower power consumption, when compared to CEVA’s first-generation AI processor,” Ceva claimed.

The NeuPro-S family includes NPS1000, NPS2000 and NPS4000, pre-configured processors with 1000, 2000 and 4000 8-bit MACs respectively per cycle. The NPS4000, for example, featuring up to 4096 8×8 MACs, offers the highest CNN performance per single core with up to 12.5 TOPS at 1.5Ghz. The company said that it is “fully scalable to reach up to 100 TOPS.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.