Deep-learning accelerators need a standard interface says Uber

SANTA CLARA, Calif. — Deep-learning accelerators need a standard interface, said a top engineering manager at Uber who sketched a picture of the company’s use of AI, its data centers, and their challenges.

“AI is really disrupting our industry” across the design of chips, boards, systems, and services, said Gloria Lau, head of hardware engineering at Uber, in a keynote at DesignCon here.

Like many other web giants, Uber uses banks of Nvidia GPUs for deep learning today, often riding Nvidia’s NVLink interface. Also like other large data center operators, it is testing FPGAs and ASICs from startups including Eyeris, Graphcore, and Wave Computing in its search for more performance and efficiency.

“I would love to see a standard interface for all AI chips — NVLink is just for Nvidia,” Lau said.

In a brief encounter after her keynote, she said that she is familiar with the CCIX standard supported by AMD, Arm, IBM, Xilinx, and others. But to date, Nvidia is not using it, she noted.

The many deep-learning algorithms that Uber uses need to “settle down” before the company can pick an ASIC accelerator. In the meantime, she noted challenges programming both FPGAs and the tensor cores in Nvidia’s latest Pascal chips.

Uber considers itself “on the bleeding edge of AI,” maintaining its own dedicated AI research team. It runs more than a dozen deep-learning models in its data centers including recommendation engines for Uber Eats, fraud detection services, and features to improve estimates of when a driver will arrive.

The algorithms span a half-dozen varieties, implemented across a laundry list of mainly open-source frameworks and libraries. The underlying AI hardware today consumes as much as 40 kW in a rack of systems — twice the power that standard servers use — and can require flows of more than 100 petabytes of data.

Amid the complexity, Uber is seeking simplicity. “We are architecting our next-generation AI server so that people other than data scientists can do the AI work,” Lau said.

Typically, Uber downloads data sets for AI training via PCIe Gen 3, but it uses Nvidia’s NVLink for gradient averaging among pools of four GPUs. Click to enlarge. (Source: Uber)

Continue reading on our sister site, EE Times: “Uber Calls for AI Standard.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.