AI chips maintain accuracy with model reduction -

AI chips maintain accuracy with model reduction

With head-to-head kickoffs for both the International Electron Devices Meeting (IEDM) in San Francisco and the Conference on Neural Information Processing Systems (NeurlPS) in Montreal, this week looms huge for anyone hoping to keep pace with R&D developments in Artificial Intelligence.

IBM researchers, for example, are detailing new AI approaches for both digital and analog AI chips. IBM boasts that its digital AI chip demonstrates, “for the first time, the successful training of deep neural networks (DNNs) using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of deep learning models and datasets.”

Separately, IBM researchers are showcasing at IEDM an analog AI chip using 8-bit precision in-memory multiplication with projected phase-change memory.

“We do think all this work we are doing — such as trying to get the precision down so that the performance can go up and the power can go down — is really important to continue to advance AI,” Jeffrey Welser, vice president and lab director at IBM Research-Almaden, told EE Times.

This is crucial, said Weiser, as the world moves from “narrow AI” where “we use AI to identify a cat, for example, on the Internet” to “broader AI” where “we analyze medical images, or we want to be able to integrate both text and imaging information together to come up with a solution.”

He added, “All those broader questions require a much larger neural net, much larger data sets and multi-modal data sets coming in… [for that], we need changes in architecture and hardware to make all that happen.”

Weiser described the two papers IBM published this week as “an interesting set of good advances” allowing the industry to move toward that [broader AI] future.

Linley Gwennap, the president of the Linley Group and principal analyst, told EE Times, “Machine learning continues to evolve rapidly. Existing hardware can’t efficiently handle the largest neural networks that researchers have built, so they are looking at all kinds of new ways to improve performance and efficiency.”

These new developments will exert tremendous pressure on hardware vendors, as chip companies “must be flexible and quick to survive in this chaotic market,” Gwennap added.

End of the GPU era for AI
IBM is boldly predicting the end of GPU domination in AI.

IBM’s Welser told EE Times, “A GPU has the ability to do lots of parallel matrix multiplications for graphics processing. Such matrix multiplications happen to be exactly the same thing you need to do with neural nets.” In his view, “That was sort of a coincidence, but it’s been incredibly important. Because without that [GPUs], we would never have achieved the level of performance we are already seeing in AI performance today.” However, Welser added, “As we’ve learned more about what it takes to do AI, we are finding ways to design a hardware that can be more efficient.”

Moving to lower precision
One route to efficiency is to lower the precision required for AI processing.

Welser explained, “The general direction which we all started to realize a few years ago was that while we are used to very precise calculation — 32-bit calculation floating point is very standard, and even 64-bit, double precision for really accurate kind of calculations — that’s not necessarily always important [in AI].”

In AI, he stressed, “What you care about for the neural net is when you show an image or word if it gets the right answer. When we ask if it is a cat or a dog, it says it’s a cat. If it’s the right answer, you don’t necessarily care about all the calculations that go in between.”

Ideally, AI should mimic the human eye. Welser said, “If you look through a foggy window, you see a person walking on the street. It’s a low-position image… but often it’s plenty to be able to say ‘oh, that’s my mom coming.’ So, it doesn’t matter whether that’s the right precision for the vision, as long as you get the right answer.”

This explains the trend toward lower precision in AI processing, he explained.

“For 32-bit calculation, I’ve got to do calculation on 32-bits. If we can do it on 16 bits, that’s basically half the calculation power, or probably half the area or even less on a chip,” Welser went on. “If you can get down to 8 bits or 4 bits, that’s even better.” He said, “So, this gives me a huge win for area, for power, and performance and throughput — how fast we can get through all of this.”

Click here for larger image (Source: IBM Research)

However, Welser acknowledged, “For a long time, we thought we’d have to stick with 32-bit precision for AI training. There was just no way around it.”

In 2015, IBM Research launched the reduced-precision approach to AI model training and inference with a paper describing a novel dataflow approach for conventional CMOS technologies . IBM showed models trained with 16-bit precision that exhibits no loss of accuracy compared to models trained at 32 bits.

Since then, IBM observed that “the reduced-precision approach was quickly adopted as the industry standard, with 16-bit training and 8-bit inferencing now commonplace and spurred an explosion of startups and VC investment for reduced precision-based AI chips.” Despite such an emerging trend, “training” with numbers represented less than 16 bits, however, has been viewed almost impossible, given one needs to maintain the high accuracy in models.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.