Neural network engine speeds inference on the edge

NMAX, a neural inferencing engine from Flex Logix, provides inferencing throughput from 1 to over 100 TOPS with high MAC utilization even for a batch size of 1, a critical requirement of edge applications. According to Flex Logix, NMAX achieves this performance at a much lower cost and with much less power consumption than competitive solutions. For example, NMAX attains data-center class performance with just one or two LPDDR4 DRAMs, compared to eight or more for other solutions. It also leverages the company’s interconnect technologies developed for eFPGA.

The NMAX compiler takes a neural model in Tensorflow or Caffe and generates binaries for programming the NMAX array. At the start of each layer, the NMAX array’s eFPGA and interconnect are configured to run the matrix multiply needed for that stage. Streaming data from SRAM located near the NMAX tile, through a variable number of NMAX clusters where weights are located, accumulate the result. This is then activated in the eFPGA and stored in SRAM located near the NMAX tile. The NMAX compiler also configures the eFPGA to implement the state machines to address the SRAMs and other functions. At the end of a stage, the NMAX array is reconfigured in under 1000 ns to process the next layer. In larger arrays, multiple layers can be configured in the array at once, with data flowing directly from one layer to another.

In development now, all IP deliverables for any size NMAX array will be available in the mid 2019 for integration into SoCs built on TSMC 16FFC and 12FFC process technologies. The NMAX compiler will be available at the same time.

>> Read the original on our sister site, EEWeb: “Flex Logix Unveils Fast Neural… | EEWeb Community.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.