TOKYO — Anyone poised to choose an AI model solely based on its accuracy might want to think again. A key issue, according to IBM Research, is how resistant the AI model is to adversarial attacks.
IBM researchers, collaborating with other research institutes, are presenting two new papers on the vulnerability of AI. One study focuses on how to certify the robustness of AI against adversarial attacks. The other examines an efficient way to test resilience of AI models already deployed.
Of course, accuracy is the Holy Grail of AI. If computers can’t beat humans, why bother with AI? Indeed, AI’s ability to recognize images and classify them has vastly improved over the last several years. As demonstrated in the results of ImageNet competitions between 2010 and 2017, computer vision can already outperform human abilities. AI’s accuracy in classifying objects in a dataset jumped from 71.8% to 97.3% in just seven years.
Companies big and small have used ImageNet as a benchmark for their image classification algorithms against the dataset. Winning an ImageNet competition has bestowed bragging rights for AI algorithm superiority.
However, the scientific community has begun paying attention to recent studies highlighting a robustness gap in well-trained deep neural networks versus adversarial examples.
Last summer, a team of researchers including IBM Research, the University of California at Davis, MIT, and JD AI Research published a paper entitled “Is Robustness the Cost of Accuracy? — A Comprehensive Study on the Robustness of 18 Deep Image Classification Models.”
According to Pin-Yu Chen, a research staff member in the AI Foundations at IBM’s Thomas J. Watson Research Center, the researchers cautioned, “Solely pursuing for a high-accuracy AI model may get us in trouble.” The team’s benchmark on 18 ImageNet models “revealed a tradeoff in accuracy and robustness.”
(Source: IBM Research)
Alarmed by the vulnerability of AI models, researchers at the MIT-IBM Watson AI Lab, including Chen, presented this week a new paper focused on the certification of AI robustness. “Just like a watch that comes with a water resistance number, we wanted to provide an effective method for certifying an attack resistance level of convolutional neural networks [CNNs],” noted Chen.
Why does this matter? It does because visually unnoticeable perturbations to natural images can mislead image classifiers toward misclassification. Chen said, “Think about safety-critical settings with AI.” An autonomous car that uses AI should readily recognize a stop sign. Yet when a minor optical illusion is cast on the sign from a nearby light source, the autonomous car sees the stop sign as a speed limit sign, Chen said. “The light source, in this case, has become a classic adversarial example.”
(Source: IBM Research)
(Source: IBM Research)
As a neural network is taught more images, it memorizes what it needs to classify. “But we don’t necessarily expect it to be robust,” said Chen. “The higher the accuracy is, the more fragile it could get.”
For autonomous vehicles, in which safety is paramount, verifying classification robustness is critical.
Techniques available today have been generally limited to certifying small-scale and simple neural-network models. In contrast, the joint IBM-MIT team found a way to certify robustness on the widely popular general CNNs.
The team’s proposed framework can “handle various architectures including convolutional layers, max-pooling layers, batch normalization layer, residual blocks, as well as general activation functions,” according to Chen. By allowing perturbation in each pixel with confined magnitude, said Chen, “We have created verification tools optimized for CNNs.” The team’s goal is “to assure you that adversarial attacks can’t alter AI’s prediction.”
Chen also pointed out that adversarial examples can come from anywhere. They exist in the physical world, digital space, and a variety of domains ranging from images and video to speech and data analysis. The newly developed certification framework can be applied in a variety of situations. In essence, it is designed to provide “attack-independent and model-agnostic” metrics, he explained.
The team also claims that its CNN certification framework is computationally efficient. It exploits the special structure of convolutional layers, reporting “more than 10× speed-up compared to state-of-the-art certification algorithms,” according to IBM Research.
Chen told us, “We’ve evaluated the input and output relations of each layer to create a matrix that is fast, certifiable, and general.”
Next page: Attacks on black-box neural networks?