Multi-cores, software's Gordian Knot and the Alexandrian SolutionOf all the classes presented on multicore and multiprocessor design at the Fall ESC in Boston, the one that stands out for me is one taught by Skip Hovsmith of CriticalBlue, titled "Optimize multicore processing to fit your software (ESC-346)."
For sound financial and productivity reasons, in the face of the parallel programming challenges presented by multiprocessing, companies and programmers are still reluctant to shift from the procedural and largely sequential tools that have served them so well for many years. This has resulted in what I can only describe as a Gordian Knot programming workarounds.
Most of the multicore classes at the Embedded Systems Conference Fall 2007 that I have looked at try to deal with this tangled web of complexities by giving developers ideas and guidance on how to use existing tools and techniques for adding explicit parallelism to their sequential code.
According to Rishiyur Nikhil and Arvid, authors of an Embedded.com's parallel programming series, such approaches - while adequate for some multicore and multiprocessor applications " have too many limitations and caveats to be useful over the long term.
To fully utilize the hardware parallelism inherent in embedded multi-core designs, they say, will require a shift to a more implicitly parallel programming language and methodology. However, many, including researchers at Microsoft, believe that it will take at least ten years for the industry to shift to a new parallel programming framework.
For the likes of Hovsmith, this is far too long. He favors the programming version of Alexander The Great's Gordian Knot Solution: don't change the software or the programming methodology; instead, change or optimize your multicore hardware to fit your sequential, procedural code.
"Although semiconductor platform developers have created innovative multicore architectures, how easy are these devices to program?" he asks. The first reaction of most developers of systems based on multicore devices, said Hovsmith, is to wonder if they can use it in the context of their legacy software and their software development environment and methodologies.
"At this point, the real multicore disconnect often becomes apparent," he said. To deal with it Hovsmith in his class describes a top down software/hardware development flow starting from regular software running on a mainstream processor and evolving onto new or existing multicore systems.
"As most end product differentiation stems from the software, it is natural to start with that software and work towards the hardware. The ability to quickly explore and evaluate many architectural and processing alternatives enables this flow."
This flow, he said, can be used to establish an efficient architecture in the first place, and can also be used to reprogram the multicore platform to generate derivative designs in the future.
With this approach, an existing or new piece of software can be analyzed and the most appropriate application level parallelism developed. Resource inter-dependencies are removed between application level functions which may then be executed on different cores. Once partitioned, the remaining implementation can be made largely automatic.
"For each of the application specific cores in the system, the user can tradeoff the key parameters of performance throughput, power consumption, silicon area and reprogrammability," Hovsmith said, "while the analysis tools extract the most appropriate levels of instruction level parallelism. "
Many candidate multicore architectures can be quickly generated, he said, all of which include the original main processor for which the software was targeted and one of more custom applications-specific coprocessor engines.
The user can guide the tools which make up the design flow to find an optimal point in the design space, considering design time, performance, power consumption, silicon area and programmability.
The advantage of this approach, said Hovsmith, is that the framework can be adapted to existing multiprocessor systems. Functional partitioning, algorithmic changes, and memory and communication network tuning can be quickly analyzed, with throughput matching done through algorithmic variations.
"Using coprocessor synthesis tools and system simulation, this approach hides unnecessary hardware details from the software developer or system architect," he said, pointing out that the combination of a simplified programming model and high speed synthesis and simulation tools encourages architectural exploration to find efficient software execution across multiple processing resources.
Instead of increasing programming complexity, said Hovsmith, this approach builds on familiar single processor programming models, with the same software description retargeted onto different platform architectures to meet different product implementation requirements.
"The embedded software starting point can be legacy code, reference code, or newly developed code," he said. "It can be C/C++, assembly code or some combination. No particular coding style is required, and no inherent knowledge of processor architectures or hardware design is needed."
Other classes on embedded multicore and multiprocessor design at the Embedded Systems Conference in Boston include "Multicore: affecting the way users design, write and debug embedded software (ST-2)" taught by Robert Oshana;"Multi-core software archtecture design (ESC-226)," presented by David Kalinsky; "Getting the most out of multi-core processors (ESC-306)," taught by Michael E. Anderson.
The multicore classes also include: "Fundamentals of Multi-core development (ESC-406);" from Todd Brian, as well as "Case studies in software optimization for multi-core SMP (ESC-346)," presented by Max Domeika; and " Multicore architectures and programming paradigms (ESC-463)," taught by Anant Agarwal.
For more resources on this topic on Embedded.com, go to "More about Multicores and Multiprocessors."