PALO ALTO, Calif. — Microprocessor designers need to adopt a balance of specialized and general-purpose architectures to succeed in deep learning, according to a talk at the inaugural SysML event by Nvidia’s chief scientist. He dismissed competing efforts in compute-in-memory, analog computing and neuromorphic computing.
Processors with memory hierarchies optimized for specialized instructions and data types like the Nvidia Volta are the best approach in the data center, said Bill Dally. At the edge, SoCs need accelerator blocks to speed neural network processing, he said.
At the event — organized by some of the leading lights in the field from Amazon, Google and Facebook — speakers called for broader involvement in the emerging technology where greater hardware performance is desperately needed, but software concepts are still evolving rapidly.
“Deep learning is transforming how we design computers…[but] custom machine-learning hardware is in its infancy, so it will be an exciting time ahead with a lot of creativity in processor design,” said Jeff Dean, a member of the Google Brain team and an event organizer.
“We’re trying to predict which primitives make the most sense, so sometimes using a little chip area for testing ideas is useful,” Dean said, adding that the code and chips “need to co-evolve.”
“Anywhere we use heuristics is a good place to consider machine learning — compilers, networking, OSes, even physical circuit design and test selection,” he said, adding fundamental work is still needed in ways to gauge machine learning’s effectiveness and APIs to smooth its integration.He predicted an increasingly broad spectrum of software will adopt machine-learning techniques.
The Nvidia CTO suggested 8-bit integer and 16-bit floating point are defaults for inference and training tasks, respectively. But for inference jobs neural network accuracy remains strong with weights using 4- and even 2-bit data in some cases. In general, “weights want to use the smallest number of bits possible,” he said.
The weights themselves can be highly pruned. Two-thirds to 90 percent of weights aren’t needed in many convolutional models, he said.
Dean said work at 4-bits and below is also effective for activations, and research shows potential for training at precision levels below 16-bit floating point. Research in several other areas could impact hardware, such as dynamic model routing, a much-debated area of batch sizes and techniques for optimizing training models, he said.
Continue to page two on Embedded's sister site, EE Times: “Google, nvidia share on AI at stanford.”