The old paradigm was simple: if you had a big job you chose a big CPU. But that notion has fractured along many different lines. Last week at Hot Chips, we saw that even when building the most compute-intensive systems, architects’ fancies are wandering away from the behemoth eight-issue, out-of-order, speculating while psychically branch-predicting, 29-stage CPUs. The new darlings are small, energy-efficient, dual-issue processors that in many ways point back to the early years of RISC, when simplicity was the cardinal virtue. SoCs now seek their performance not by wringing every last drop of instruction-level parallelism from the code, but rather by running as many light threads as possible in parallel.
The shift has changed the way architects think about performance. Rather than how big a CPU do I need for this entire job, there are new questions. How do I decompose the job into threads with a minimum of dependencies? How do I manage the dependencies I have? How can I run all these little threads with a bunch of little processors? What once was one really big CPU is becoming a cluster of little ones, with custom hardware at the pain points.
The same principle is at work at the data center level. In one of the more unusual papers at Hot Chips, Ashutosh Dhodapkar, founding engineer at SeaMicro, described building out a server floor not with blades using conventional server-class microprocessors, but with a huge fabric of Atom processors: little, in-order, dual-issue cores that Intel had probably originally intended to be handset application processors.
SeaMicro matches two Atom microprocessors up with one proprietary 90nm ASIC that implements fabric interface, memory, and I/O connections. Then the company weaves these ASIC-plus-two-MPUs nodes into a toroidal fabric totaling up to 768 Atom processors, 1.5 TBytes of DRAM, and 64 SATA drives. Each torus then gets connected to the conventional data center network.
“If you look at the applications in most data centers, they aren’t huge,” Dhodapkar argued. “They are small and bursty. It’s the number of users and the size of some data sets that are huge. Such loads match a large array of small, low-power processors better than they fit an array of big server-class processors.” Dhodapkar has numbers to back up his assertion, claiming that his systems can match the throughput of a conventional bank of servers with one quarter the power consumption and one quarter the space.
The same reasoning applies at the chip level, as the emergence of more and more compute-intensive SoCs based on large arrays of small RISC cores suggests. As we learn more about our computing loads, whether in the cloud, in a packet switch, or in an embedded application, the giant microprocessor is beginning to look more and more like an evolutionary dead end. That may be even harder to accept emotionally than the loss of the Brontosaurus or the Buick Roadmaster.