SAN FRANCISCO — Samsung described a new neural-network accelerator for smartphones that matches blocks from rivals such as Huawei. Toshiba detailed one for self-driving cars that pulls ahead of competitors such as Intel’s Mobileye at the International Solid-State Circuits Conference here.
A 5.5-mm 2 block in the latest 8-nm Exynos chip delivers 1.9 tera-operations/second (TOPS) using 8-bit precision running at up to 933 MHz, said Jinook Song, a Samsung AI engineer. That’s about the rating for the latest Kirin processor in Huawei phones and the latest commercial IP blocks.
However, the block hits performance of 6.937 TOPS when a neural net allows pruning of up to three-quarters of its weight. The chip delivers a range of 4.5 to 11.5 TOPS/W when consuming from 39 mW at 0.5 V to 1.553 W at 0.8 V.
Samsung detailed its 8-nm Exynos deep-learning block and its performance. Click to enlarge. (Source: ISSCC)
Like mobile architectures from Cadence, Ceva, and Nvidia, the Samsung chip makes heavy use of pruning and quantization, running 8- and 16-bit operations to optimize for efficiency and network sparsity. “If you are not using sparsity and compression at this point, you are behind the curve,” said Mike Demler, an analyst from the Linley Group who was attending the session.
It’s not clear if the implementation is a full dataflow architecture, another trend implemented in the latest IP blocks, Demler said.
The Samsung design appears in the latest Exynos chip and is expected to be used in at least some new handsets that the South Korean giant is announcing as early as this week. To enhance parallelism, it uses two cores, each with two data-staging units sharing 512-KByte scratch pads.