Leaders in mainstream computing are intensifying efforts to find a parallel-programming model to feed the multicore processors already on chip makers' drawing boards.
Intel Corp. and Microsoft Corp. have awarded an estimated $10 million, five-year grant to help fund a new Parallel Computing Lab at the University of California at Berkeley, with 14 faculty members initially involved. As many as 20 universities, including MIT, Stanford and the University of Illinois, competed for funding.
Late last year, Advanced Micro Devices Inc. assigned one of its chief architects full-time to the job of rallying the industry around concepts supporting heterogeneous multicore CPUs. Developers need to expand the current software stack in fundamental ways to handle a coming crop of processors that use a variety of cores, accelerators and memory types, according to the company.
Both AMD and Intel have said they will ship processors using a mix of X86 and graphics cores as early as next year, with core counts quickly rising to eight or more per chip. But software developers are still stuck with a mainly serial programming model that cannot easily take advantage of the new hardware.
Thus, there's little doubt the computer industry needs a new parallel-programming model to support these multicore processors. But just what that model will be, and when and how it will arrive, are still up in the air.
“It's a critical problem, and the technology is needed right now,” said William Dally, a professor of computer science at Stanford. “The danger is we will not have a good model when we need it, and people will wind up creating a generation of difficult legacy code we will have to live with for a long time.”
Dally said he would urge the industry “to start experimenting right away and try a dozen different ideas to find a few that work.” Even then, he said, “the best ideas that emerge from that work won't be perfect in their first implementations.”
“The industry is in a little bit of a panic about how to program multicore processors, especially heterogeneous ones,” said Chuck Moore, a senior fellow at AMD now serving as chief architect of the company's so-called accelerated computing initiative. “To make effective use of multicore hardware today, you need a PhD in computer science. That can't continue if we want to enable heterogeneous CPUs.”
Berkeley bakes ideas
The Berkeley Parallel Computing Lab got its start in February 2005 when the university held a series of weekly talks on the issue. In December 2006, researchers published a white paper that arose from those discussions.
A team of researchers has already started prototyping software systems based on ideas the group has fleshed out. The group could publish preliminary results in a matter of months.
Essentially, the lab aims to define a way to compose parallel programs based on flexible sets of standard modules in a method similar to how serial programs are written today. The challenge in the parallel world is finding a dynamic and flexible approach to schedule parallel tasks from such modules across available hardware in complex, heterogeneous, multicore CPUs.
The group believes developers could create a set of perhaps a dozen frameworks that understand the intricacies of the hardware. The frameworks could be used to write modules that handle specific tasks, such as solving a matrix. New run-time environments could dynamically schedule those modules across available cores of various types.
The new approach would replace the global schedulers used in today's serial software. The frameworks would supplant parallel libraries, which are not always well suited to the specifics of a given parallel application and cannot be easily mixed and matched as needed.
The Berkeley effort has set its sights on a long-term horizon beyond the eight- to 16-core processors likely to hit the market in mainstream CPUs in the next two to five years. Instead, it will focus on problems in programming chips with dozens of cores.
Researchers believe that over the next five years or so, chip makers will use a fairly diverse set of cores. As time goes on, however, those cores may become increasingly similar, making it easier both to verify the silicon and to program the hardware.
Stanford takes a stab
Dally of Stanford said it makes sense for processors to use a diverse set of cores. “Specialized processors can get one or maybe two orders of magnitude better power efficiency on data-intensive SIMD problems,” he said.
Dally helped launch a microprocessor company, Stream Processors Inc., based on using such cores as an alternative to traditional DSPs. Other Stanford re- searchers created the Brook programming language, which was later adapted by the graphics division of AMD to help its massively parallel graphics chip handle general-purpose jobs, another big trend driving parallelism.
“Everyone is moving in this direction,” said Dally, noting similar work at Intel and Nvidia. “But they need to develop a more-general approach to the problem that has legs for the future.”
For its part, Stanford is exploring a technique based on a number of high-level domain-specific languages that interact with a set of common parallel run-time environments to access multicore hardware.
The challenge in the run-time environments, Dally said, is balancing static mechanisms that structure large data flows with dynamic abilities to find and treat parallel execution opportunities that crop up as programs are running.
Specifically, Stanford hopes to combine work on two projects. One uses transactional memory technology to find ways to handle dynamic scheduling. Another uses a novel language called Sequoia, aimed at data-intensive applications such as digital media processing.
Researchers at the University of Illinois, meanwhile, have explored ways to extract parallelism from today's serial code. They have also worked on compilers and programming models for next-generation graphics chips as well as the Intel Itanium processor.
For its part, AMD has started talks with partners including Microsoft on its ideas for easing the job of programming multicore devices like its Fusion chips, which will mix X86, graphics and other cores starting in 2009.
“We don't have a specific proposal, but we are out talking to partners about the concept and it's getting a lot of attention,” said Moore of AMD. “Over the next few months or quarters, I think we will sharpen our views and put out a proposal–and perhaps a consortium behind it. It's not just an AMD thing. It's an open system and lets other players innovate at different layers.”
In AMD's view, the new computer stack could include an expanded set of run-time environments above the operating system that could help find, schedule, synchronize and manage chip-level resources for applications programmers.
Below the operating system, virtualization software could be extended to better track and correct programming errors.
Further in the future, PC processors are likely to use a wider variety of memory types as well. They may also use stacking technologies to create more-complex system-on-chip designs, according to Moore,
AMD last year stationed one of its senior fellows, Rich Witek, in Redmond, Wash., to set up an advanced technology lab that will pursue the initiative and other future concepts. Witek led development teams for many microprocessors, including the StrongARM and Alpha chips at Digital Equipment Corp.
“We cooperated with [Microsoft] successfully on the AMD 64-bit technology, and they are ready to go at it again,” Moore said.
Separately, Microsoft has hired parallel-computing veterans–including Burton Smith, former chief scientist of Cray, and Dan Reed of the National Center for Supercomputing Applications–to help sort out the thorny issues ahead.
So far AMD has not engaged its archrival, Intel, which manufactures the majority of PC processors. Moore praised Intel for its efforts in multicore design with startups and universities as “good computer science work.”
“We are trying to take it one step further and help define heterogeneous platforms,” he said. “We welcome any and all sides to participate in an open and mutually beneficial environment that creates more opportunities, but I don't go around and pitch this stuff to them specifically yet.”
In Moore's view, “X86 compatibility is of paramount importance for the foreseeable future. That's where most of the programs and the OS will run. But at some point, some jobs will run faster and in a more power-efficient manner on targeted accelerators.”
Moore supervised the design of AMD's next-generation high-performance X86 core, code-named Bulldozer, until mid-December, when he transitioned into full-time work rallying support for new software.
“The core design is going along well–well enough to let me roll off to take on this broader initiative,” said Moore.