High-performance AI IP works to optimize power consumption - Embedded.com

High-performance AI IP works to optimize power consumption


Performance of machine-learning inference engines is critical in vision-based applications like advanced driver assist systems (ADAS), video surveillance, and others that depend on rapid classification of objects. In these and a growing array of applications, however, the ability to achieve high performance levels with reduced power consumption can be a significant advantage. With its DesignWare EV7x Vision Processor IP, Synopsys couples high-performance computing resources with software tools and integrated architectural features including power- and clock-gating designed to optimize power consumption without compromising performance.

Designed as a high-peformance vision-processing platform, the DesignWare EV7x IP architecture supports up to vision processing units (VPUs), a separate optional deep neural network (DNN) accelerator with dedicated and shared memory resources and a high-performance AES encryption core (Figure 1).

click for larger image

Figure 1. The Synopsys DesignWare EV7x Vision Processor IP architecture combines multiple vision-processing units (VPUs) and an optional deep neural network (DNN) accelerator that can be configured with as many as 14,080 multiplier–accumulators (MACs). (Source: Synopsys)

Each VPU combines a 32-bit scalar unit and a 512-bit-wide vector DSP with a vector floating point unit (VFPU) and vector memory (VCCM) unit and can be configured for 8-, 16-, or 32-bit operation in simultaneous multiply-accumulate cycles on separate data streams. The DNN accelerator scales from 880 to 14,080 MACs in blocks of 880 to support convolutional neural networks (CNNs) underlying vision machine-learning models as well as LSTMs (long short-term memories) used in analysis of sequences spanning time or space. At the same time, the architecture provides the flexibility needed to track the very rapid evolution of machine-learning algorithms.

Gordon Cooper, Synopsys Product Marketing Manager for ARC Processors, told Embedded that the challenge lies in creating a solution that is much like an application-specific integrated circuit (ASIC) in providing low power and small size but with programmable flexibility. The result is the ability to support any type of graph (the interconnected data structures that make up a machine-learning inference model). The nature of that support can even extend beyond the DNN accelerator itself thanks to the Synopsys MetaWare EV development toolkit and its DNN mapping tools.

“For maximum flexibility and future-proofing, the tools can distribute computations between the vision processors and CNN resources to support new and emerging neural network algorithms as well as customer-specific CNN layers,” said Cooper.

The mapping tools understand model topology for optimizing the graph itself with capabilities such as layer merging and network pruning. At a lower level, the mapping tools can partition graphs across the DNN accelerator’s resources using layer-based or frame-based partitioning with feature-map partitioning to be included in a later release.

At an even deeper level, the combination of architecture, hardware mechanisms, and development toolkit capabilities work to optimize power consumption through approaches that include memory bandwidth optimization and gating mechanisms.

“Memory access and the bus fabric are key contributors to power consumption,” said Fergus Casey, Synopsys R&D Director of ARC Processors. “By reducing the bandwidth in and out of the processors, we reduce power.”

While architectural features such as the IP’s extensive memory hierarchy (see Figure 1) help reduce off-chip memory access, the IP supports features such as a DMA broadcasting mechanism. This mechanism simultaneously issues the same set of model coefficients to separate model partitions, substantially reducing bandwidth.

The architecture’s power- and clock-gating mechanisms offer further opportunities for power reduction. Using the development toolkit with EV7x-IP-based AI chips, application developers can determine if their particular graph can benefit from gating power to the one or more of the IP architecture’s power domains.

“Through the DNN SDK toolchain and EV runtime, developers can balance power consumption for a particular graph against performance requirements and whether they need more DNN slices enabled,” said Casey.

The power gating configuration selected for a particular graph provides a relatively coarse level of power management that is not suitable for every application. According to Casey, power gating incurs “tens of cycles” to bring a power-gated core back to full function, which may conflict with latency requirements in some cases.

Any graph and application is likely to benefit from the architecture’s clock-gating capability. Working at very fine level of granularity, the architecture’s clock-gating mechanism operates automatically as an EV7x-IP-based AI chip executes its vision-processing algorithms.

“Based on what’s executing in the processor, the IP can make a determination of the number of MACs required and clock-gate any not in use,” said Casey.

For more information, visit the Synopsys EV7x product page. Synopsys also provides a video in which Gordon Cooper offers an introduction the DesignWare EV7x Vision Processor:

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.