Machines can see, hear and analyze thanks to embedded neural networks
The potential applications around artificial intelligence (AI) continue to grow on a daily basis. As the power of different neural network (NN) architectures are tested, tuned and refined to tackle different problems, diverse methods of optimally analyzing data using AI are found. Much of today’s AI applications such as Google Translate and Amazon Alexa’s speech recognition and vision recognition systems leverage the power of the cloud. By relying upon always-on Internet connections, high bandwidth links and web services, the power of AI can be integrated into Internet of Things (IoT) products and smartphone apps. To date, most attention is focused on vision-based AI, partly because it is easy to visualize in news reports and videos, and partly because it is such a human-like activity.
Sound and Vision Neural Network (Image: CEVA)
For image recognition, a 2D image is analysed – a square group of pixels at a time – with successive layers of the NN recognizing ever larger features. At the beginning, edges of high difference in contrast will be detected. In a face, this will occur around features such as the eyes, nose, and mouth. As the detection process progresses deeper into the network, whole facial features are detected. In the final phase, the combination of features and their position will tend toward a specific face in the available dataset being identified as a likely match.
Neural Network feature extraction (Image: CEVA)
The hope is that the neural network will provide the highest probability of a match with the face in its database that matches that of the subject photographed or captured by a camera. The clever element here is that the subject may not have been captured at exactly the same angle or pose as the photograph in the database, nor under the same lighting conditions.
AI has become so prevalent so quickly in a large part due to open software tools, known as frameworks, that make it easy to build and train an NN for a target application in a variety of programming languages. Two such common frameworks are TensorFlow and Caffe. Where the item to be recognized is already known, an NN can be defined and trained offline. Once trained, the NN can then be easily deployed to an embedded platform. This is a clever partitioning that allows the power of a development PC or the cloud train the NN, while the power-sensitive embedded processor is simply using the training data for the purposes of recognition.
The human-like ability to recognize people and objects is closely linked with trendy applications, such as industrial robots and autonomous cars. However, AI is of equal interest and capability in the field of audio. In the same way that features can be analyzed in an image, audio can be broken down into features that can be fed into an NN. One method uses the Mel-Frequency Cepstral Coefficient (MFCC) to break audio down into usable features. Initially, the audio sample is broken down into short time frames, e.g. 20ms, and then, using Fourier transforms of the signal, powers of the audio spectrum are mapped onto a nonlinear scale using triangular overlapping windows.
Sound Neural Network Breakdown (Image: CEVA)
Continue reading on page two >>