Current AI hardware focus is misguided, says AI pioneer

“It’s really hard to succeed with exotic hardware,” Facebook Chief AI Scientist Yann Le Cun told the audience for his keynote speech at NeurIPS. Addressing the global gathering of AI experts in Vancouver, Canada, in December, Le Cun surveyed the history of specialized computing chips for processing neural network workloads, offered a glimpse of what Facebook is working on, and made some predictions for the future of deep-learning hardware.

Ancient history

Le Cun is a renowned visionary in the field of AI, having been at the forefront of neural-network research in the 1980s and 1990s. As a Bell Labs researcher in the late 1980s, he worked with the earliest types of dedicated neural-network processors, which comprised resistor arrays and were used to perform matrix multiplication. As neural networks fell out of favor in the late 1990s and early 2000s, Le Cun was one of a handful of scientists who continued to work in the field. In his keynote, he shared some of the things he learned about hardware for deep learning during that time.


Facebook Chief AI Scientist Yann Le Cun

First, tools are really important. What killed neural nets (temporarily) in the ’90s was that only a few people — including Le Cun — had tools to train them. Le Cun and his colleagues spent a lot of time building what would now be called a deep-learning framework: a flexible piece of software that interpreted front-end languages, allowing the researchers to train and experiment with neural networks. The researchers’ work advanced the concept that deep-learning systems can be assembled from differentiable modules and then automatically differentiated. While novel at the time, this is common practice now.

The right tools gave Le Cun’s team its “superpower” and were also an important factor in producing reproducible results, he said. “Good results are not enough … even if you get good results, people will still be skeptical,” he said. “Making those results reproducible is almost as important as actually producing the results in the first place.”

Along with the right tools, hardware performance is crucial to the research community, as hardware limitations can influence entire directions of research, said Le Cun.

“[What] the hardware community builds for research or for training actually influences what ideas people think of,” he said. “Entire ideas can be abandoned just because the hardware is not powerful enough, even though they were good ideas.”

The answer may not lie with new and novel forms of computing, he said, noting that many exotic fabrication technologies failed to take off when they didn’t fit in with the existing computing environment.

One of Le Cun’s frustrations with today’s hardware solutions for AI acceleration is that most are built for matrix multiplication, not convolution, which is the key mathematical operation used in most image-processing and speech-recognition neural networks today. “[The prevailing approach] will become more and more wrong, in the sense that we are going to have bigger and bigger requirements for power,” he said. “If we build a generic piece of hardware where 95% of the cycles are spent on doing convolutions, we are not doing a good job.”

Killer app

The future, as Le Cun described it, will see convolutional neural networks (CNNs) used in everything from toys to vacuum cleaners to medical equipment. But the killer app — the one application that will prove AI’s value to consumer devices — is the augmented-reality headset.

Facebook is currently working on hardware for AR glasses. It’s a huge hardware challenge because of the amount of processing required at low latency, powered only by batteries. “When you move, the overlaid objects in the world should move with the world, not with you, and that requires quite a bit of computation,” said Le Cun.

Facebook envisions AR glasses that are operated by voice and interact through gestures via real-time hand tracking. While those features are possible today, they are beyond what we can do in terms of power consumption, performance, and form factor. Le Cun noted a few “tricks” that can help.

For example, when running the same neural network on every frame of a video — perhaps to detect objects — it doesn’t matter if the result for one frame is wrong, because we can look at the frames before and after it and check for consistency.

“So you could imagine using extremely low-power hardware that is not perfect; in other words, you can [tolerate] bit flips once in a while,” said Le Cun. “It’s easy to do this by lowering the voltage of the power supply.”

Neural-net developments

The rapid evolution of neural networks is a major challenge for hardware design. For example, dynamic networks — those with memory that can be trained to learn sequential or time-varying patterns — are gaining in popularity, especially for natural-language processing (NLP). However, they behave differently from many assumptions made by current hardware. The compute graph can’t be optimized at compile time; that has to be done at runtime. It’s also rather difficult to implement batching, a popular technique through which more than one sample is processed at once to improve performance.

“All the most common hardware that we have at our disposal assumes that you can batch, because if you have a batch with more than one sample, then you can turn every operation into a matrix multiplication, including convolutions and fully connected nets,” said Le Cun. “[It] is a challenge for the hardware community to create architectures that don’t lose performance by using batch size = 1. That applies to training, of course; the optimal size of batch for training is 1. We use more because our hardware forces us to do so.”

Self-supervised learning

Another challenge for hardware is that the learning paradigms we currently use will change, and this will happen imminently, according to Le Cun.

“There is a lot of work [being done] on trying to get machines to learn more like humans and animals, and humans and animals don’t learn by supervised learning or even by reinforcement learning,” he said. “They learn by something I call self-supervised learning, which is mostly by observation.”

Le Cun described a common approach to self-supervised learning in which a piece of the sample is masked and the system is trained to predict the content of the masked piece based on the part of the sample that’s available. This is commonly used with images, wherein part of the image is removed, and text, with one or more words blanked out. Work so far has shown that it is particularly effective for NLP; the type of networks used, transformers, have a training phase that uses self-supervised learning.

The trouble from a hardware perspective is that transformer networks for NLP can be enormous: The biggest ones today have 5 billion parameters and are growing fast, said Le Cun. The networks are so big that they don’t fit into GPU memories and have to be broken into pieces.

“Self-supervised learning is the future — there is no question [about that],” he said. “But this is a challenge for the hardware community because the memory requirements are absolutely gigantic. Because these systems are trained with unlabeled data, which is abundant, we can train very large networks in terms of data. Hardware requirements for the final system will be much, much bigger than they currently are. The hardware race will not stop any time soon.”

Hardware trends

New hardware ideas that use techniques such as analog computing, spintronics, and optical systems are on Le Cun’s radar. He cited communication difficulties — problems converting signals between novel hardware and the rest of the required computing infrastructure — as a big drawback. Analog implementations, he said, rely on making activations extremely sparse in order to gain advantages in energy consumption, and he questioned whether this will always be possible.

Le Cun described himself as “skeptical” of futuristic new approaches such as spiking neural networks and neuromorphic computing in general. There is a need to prove that the algorithms work before building chips for them, he said.

“Driving the design of such systems through hardware, hoping that someone will come up with an algorithm that will use this hardware, is probably not a good idea,” Le Cun said.

A Neural-Network Processing Timeline

Late 1980s: Resistor arrays are used to do matrix multiplication. By the late 1980s, the arrays have gained amplifiers and converters around them but are still quite primitive by today’s standards. The limitation is how fast data can be fed into the chip.
1991: The first chip designed for convolutional neural networks (CNNs) is built. The chip is capable of 320 giga-operations per second (GOPS) on binary data, with digital shift registers that minimize the amount of external traffic needed to perform a convolution, thereby speeding up operation. The chip does not see use beyond academia.
1992: ANNA, an analog neural network ALU chip, debuts. Designed for CNNs with 6-bit weights and 3-bit activations, ANNA contains 180,000 transistors in 0.9-μm CMOS. It is used for optical-character recognition of handwritten text.
1996: DIANA, a digital version of ANNA, is released. But with neural networks falling out of favor by the mid-1990s, DIANA is eventually repurposed for signal processing in cellphone towers.
2009–2010: Researchers demonstrate a hardware neural-network accelerator on an FPGA (the Xilinx Virtex 6). It runs a demo for semantic segmentation for automated driving and it is capable of 150 GOPS at around 0.5 W. The team, from Purdue University, tries to make an ASIC based on this work, but the project proves unsuccessful. (Source: Yann Le Cun/Facebook)

>> This article was originally published on our sister site, EE Times Europe.

 

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.