Teasing out a neural net exchange format
Machine learning with deep neural networks has become a hot topic as these techniques surpass human performance in numerous areas. The enthusiasm already has spawned several approaches to handle the training and inference stages of deploying a neural network.
Training and cloud inference typically make use of similar hardware, mostly GPUs although there are cloud based inference-only engines such as Google’s TPU. By contrast, inference on edge devices depends on highly optimized hardware accelerators, mostly DSP based, that share very little in common and present unique optimization challenges.
The wide variety of inference engines is matched by the variety of training frameworks, all of which have their own capabilities and output formats. The growing fragmentation is very likely to be harmful to the growth of mobile intelligent systems.
The Khronos Group has decided to step in given its mission to connect software to hardware. Khronos produces open standards that abstract hardware details so that software developers can deal with a single platform. For example, its OpenVX standard provides a graph-based API for accelerating computer vision hardware, including neural network-based systems by allowing the import of trained networks as nodes in a graph.
Now Khronos is readying a standard interchange format to map training frameworks to inference engines. The Neural Network Exchange Format (NNEF) is an open, scalable transfer format that allows engineers to move trained networks from any framework that supports the cross-vendor format into any inference engine that can read it. It’s a sort of PDF for neural networks.
The NNEF standard is still in definition, so in keeping with Khronos practice its details are still confidential. Once ratified the details will be made fully available so that anyone, not just Khronos members, will be able to produce NNEF exporters and importers.
The most typical use case for NNEF will be to move networks between a framework such as Caffe or TensorFlow and a proprietary inference engine, optionally using OpenVX. The document is likely to be machine generated but NNEF has been designed to be human readable so that it can also be used as the source code for creating new networks that are hierarchical and very compact. For example, the structure of GoogleNet, takes about twenty lines to describe.
The first version of NNEF is primarily aimed at loading trained networks for inference. Because one of the group’s aims is to enable efficient execution on low power machines, the standard will include the ability to quantize, reinforce and retrain via any tool that can understand the format. This feature is expected to enable third party tools to optimize and retrain networks originally trained in floating point for inferencing at much lower, power-efficient precisions.
Work on the standard is at an advanced stage and the working group is expected to have a final candidate for ratification in just a couple of months. At that point, extensive testing and industry comment will be needed to ensure the ratified standard is both robust and useful.
Consequently, Khronos is extending an invitation to data scientists and engineers to take part in an NNEF advisory panel, especially people working on non-standard and novel network inferencing architectures. Participation does not require a Khronos membership and will give interested companies and individuals an opportunity to contribute and provide feedback to this important work. If you would like to take part, send an email to email@example.com.
--Peter McGuinness (@PeterMcGuinne55) is the chairman of the Neural Network Exchange Format working group at Khronos and a veteran of Inmos, STMicroelectronics and Imagination Technologies.