Multi-cores, software's Gordian Knot and the Alexandrian Solution - Embedded.com

Multi-cores, software’s Gordian Knot and the Alexandrian Solution

Of all the classes presented on multicore and multiprocessor design atthe Fall ESC in Boston, the one that stands out for me is one taught bySkip Hovsmith of CriticalBlue, titled “Optimize multicore processing to fit yoursoftware (ESC-346).”

For sound financial and productivity reasons, in the face of theparallel programming challenges presented by multiprocessing, companiesand programmers are still reluctant to shift from the procedural andlargely sequential tools that have served them so well for many years.This has resulted in what I can only describe as a Gordian Knot programmingworkarounds.

Most of the multicore classes at the Embedded Systems Conference Fall 2007that I have looked at try to deal with this tangled web of complexitiesby giving developers ideas and guidance on how to use existing toolsand techniques for adding explicit parallelism to their sequentialcode.

According to Rishiyur Nikhil and Arvid, authors of an Embedded.com's parallel programming series,such approaches – while adequate for some multicore and multiprocessorapplications ” have too many limitations and caveats to be useful overthe long term.

To fully utilize the hardware parallelism inherent in embeddedmulti-core designs, they say, will require a shift to a more implicitlyparallel programming language and methodology. However, many, includingresearchers at Microsoft,believe that it will take at least ten years for the industry to shiftto a new parallel programming framework.

For the likes of Hovsmith, this is far too long. He favors theprogramming version of Alexander The Great's GordianKnot Solution: don't change the software or the programmingmethodology; instead, change or optimize your multicore hardware to fityour sequential, procedural code.

“Although semiconductor platform developers have created innovativemulticore architectures, how easy are these devices to program?” heasks. The first reaction of most developers of systems based onmulticore devices, said Hovsmith, is to wonder if they can use it inthe context of their legacy software and their software developmentenvironment and methodologies.

“At this point, the real multicore disconnect often becomesapparent,” he said. To deal with it Hovsmith in his class describes atop down software/hardware development flow starting from regularsoftware running on a mainstream processor and evolving onto new orexisting multicore systems.

“As most end product differentiation stems from the software, it isnatural to start with that software and work towards the hardware. Theability to quickly explore and evaluate many architectural andprocessing alternatives enables this flow.”

This flow, he said, can be used to establish an efficientarchitecture in the first place, and can also be used to reprogram themulticore platform to generate derivative designs in the future.

With this approach, an existing or new piece of software can beanalyzed and the most appropriate application level parallelismdeveloped. Resource inter-dependencies are removed between applicationlevel functions which may then be executed on different cores. Oncepartitioned, the remaining implementation can be made largelyautomatic.

“For each of the application specific cores in the system, the usercan tradeoff the key parameters of performance throughput, powerconsumption, silicon area and reprogrammability,” Hovsmith said, “whilethe analysis tools extract the most appropriate levels of instructionlevel parallelism. “

Many candidate multicore architectures can be quickly generated, hesaid, all of which include the original main processor for which thesoftware was targeted and one of more custom applications-specificcoprocessor engines.

The user can guide the tools which make up the design flow to findan optimal point in the design space, considering design time,performance, power consumption, silicon area and programmability.

The advantage of this approach, said Hovsmith, is that the frameworkcan be adapted to existing multiprocessor systems. Functionalpartitioning, algorithmic changes, and memory and communication networktuning can be quickly analyzed, with throughput matching done throughalgorithmic variations.

“Using coprocessor synthesis tools and system simulation, thisapproach hides unnecessary hardware details from the software developeror system architect,” he said, pointing out that the combination of asimplified programming model and high speed synthesis and simulationtools encourages architectural exploration to find efficient softwareexecution across multiple processing resources.

Instead of increasing programming complexity, said Hovsmith, thisapproach builds on familiar single processor programming models, withthe same software description retargeted onto different platformarchitectures to meet different product implementation requirements.

“The embedded software starting point can be legacy code, referencecode, or newly developed code,” he said. “It can be C/C++, assemblycode or some combination. No particular coding style is required, andno inherent knowledge of processor architectures or hardware design isneeded.”

Other classes on embedded multicore and multiprocessor design at theEmbedded Systems Conference in Boston include “Multicore:affecting the way users design, write and debug embedded software (ST-2)”taught by Robert Oshana;”Multi-coresoftware archtecture design (ESC-226),” presented by DavidKalinsky; “Gettingthe most out of multi-core processors (ESC-306),” taught by MichaelE. Anderson.

The multicore classes also include: “Fundamentalsof Multi-core development (ESC-406);” from Todd Brian, as well as “Casestudies in software optimization for multi-core SMP (ESC-346),”presented by Max Domeika; and ” Multicorearchitectures and programming paradigms (ESC-463),” taught by AnantAgarwal.

For more resources on this topic on Embedded.com, go to “More about Multicores and Multiprocessors.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.