Push-button generation of deep neural networks
The term "deep learning" refers to using deep (multi-layer) artificial neural networks to make sense out of complex data such as images, sounds, and text. Until recently, this technology has been largely relegated to academia. Over the past couple of years, however, increased computing performance coupled with reduced power consumption and augmented by major strides in neural network frameworks and algorithms has thrust deep learning into the mainstream.
When I attended the Embedded Vision Summit recently, for example, I saw an amazing demonstration of machine vision in which a deep neural network (DNN) running on an FPGA was identifying randomly presented images in real time (check out this column to see a video). As an aside, one of the best lines I heard at the summit was "You can't swing a dead cat in here without some deep learning system saying 'Hey, that's a dead cat!'" But we digress...
As another example, take a look at this column describing how researchers at MIT used a deep learning algorithm to analyze videos showing tens of thousands of different objects and materials being prodded, scraped, and hit with a drumstick. The trained algorithm could subsequently watch silent videos and generate accompanying sounds sufficiently convincing to fool human observers.
Two of the most popular frameworks for deep learning are Caffe-based networks and Google's TensorFlow-based networks. Caffe is a well-known and widely-used machine-vision library that ported Matlab's implementation of fast convolutional nets to C and C++; it was created with expression, speed, and modularity in mind; and it's primarily employed by academics and researchers with some commercial use. TensorFlow is a relatively new alternative to Caffe that is supported and promoted by Google; it features a software library for numerical computation using data flow graphs; and it's scalable and applicable to both research and commercial applications.
Caffe was designed for image classification and is not intended for other deep-learning applications such as text or sound. By comparison, TensorFlow has been created from the ground up to address a wide range of target applications.
The original deep learning frameworks supported only linear networks. Modern frameworks, like TensorFlow, support more sophisticated topologies involving multiple layers per level and multiple-input-multiple-output.
There are several steps involved in creating a deep neural network. The first is to define and implement the network architecture and topology. Next, the network undergoes a training stage, which is performed offline on a powerful computing platform using tens or hundreds of thousands of images (in the case of a machine vision application). The result is a floating-point representation of the network and its "weights" (coefficients).
The final step is to take the floating-point representation of the network and its weights and transmogrify it into a fixed-point equivalent suitable for running on a target platform.
All of which brings us to the fact that CEVA has just announced the second generation of its CEVA Deep Neural Network (CDNN2). CDNN2 is a neural network software framework for machine learning that features the CEVA Network Generator. In turn, the CEVA Network Generator can take a floating-point representation of a network -- Caffe-based or TenserFlow-based (any topography) -- and transmogrify it into a small, fast, energy-efficient fixed-point equivalent targeted at the CEVA-XM4 intelligent vision processor (the CEVA-XM4 can be realized as a hard core on an SoC or as a soft-core on an FPGA).
CDNN2 supports the most advanced neural network layers in use today, including the following:
- Input manipulation layer (pre-process stage resize, jittering and more)
- Pooling (Average and Max)
- Fully Connected
- Activation (ReLU, Parametric ReLU, TanH, Sigmoid)
- New: Deconvolution
- New: Concatenation
- New: Upsample
- New: Argmax
- New: Custom user layer attaching a specific functionality
The folks at CEVA boast that taking a floating-point network, transmogrifying it into its fixed-point equivalent, loading it into a CEVA-XM4 engine, and running it is a "push-button" approach. Of course, we've all seen (sometimes given) demonstrations involving a little slight-of-hand and "Here's one I prepared earlier," so the guys and gals at CEVA have prepared this video showing the entire process in a single (less than 10-minute) shot.
CDNN2 is intended to be used for object recognition, advanced driver assistance systems (ADAS), Artificial intelligence (AI), video analytics, augmented reality (AR), virtual reality (VR) and similar computer vision applications.
Coupled with the CEVA-XM4 intelligent vision processor, CDNN2 offers significant time-to-market and power advantages for implementing machine learning in embedded systems for smartphones, advanced driver assistance systems (ADAS), surveillance equipment, drones, robots and other camera-enabled smart devices.
The CDNN2 software library is supplied as source code, extending the CEVA-XM4's existing Application Developer Kit (ADK) and computer vision library, CEVA-CV. It is flexible and modular, capable of supporting either complete CNN implementations or specific layers for a wide breadth of networks. These networks include Alexnet, GoogLeNet, ResidualNet (ResNet), SegNet, VGG (VGG-19, VGG-16, VGG_S) and Network-in-network (NIN), among others.
As noted earlier, CDNN2 supports the most advanced neural network layers, including convolution, deconvolution, pooling, fully connected, softmax, concatenation, and upsample, as well as various inception models. All network topologies are supported, including Multiple-Input-Multiple-Output, multiple layers per level, fully convolutional networks, in addition to linear networks (such as Alexnet).
Click Here for more information on CEVA, CEVA-XM4, and CDNN2.