Of all the classes presented on multicore and multiprocessor design at
the Fall ESC in Boston, the one that stands out for me is one taught by
Skip Hovsmith of CriticalBlue, titled "
Optimize multicore processing to fit your
software (ESC-346)."
For sound financial and productivity reasons, in the face of the
parallel programming challenges presented by multiprocessing, companies
and programmers are still reluctant to shift from the procedural and
largely sequential tools that have served them so well for many years.
This has resulted in what I can only describe as a Gordian Knot programming
workarounds.
Most of the multicore classes at the Embedded Systems Conference Fall 2007
that I have looked at try to deal with this tangled web of complexities
by giving developers ideas and guidance on how to use existing tools
and techniques for adding explicit parallelism to their sequential
code.
According to Rishiyur Nikhil and Arvid, authors of an Embedded.com's parallel programming series,
such approaches - while adequate for some multicore and multiprocessor
applications " have too many limitations and caveats to be useful over
the long term.
To fully utilize the hardware parallelism inherent in embedded
multi-core designs, they say, will require a shift to a more implicitly
parallel programming language and methodology. However, many, including
researchers at Microsoft,
believe that it will take at least ten years for the industry to shift
to a new parallel programming framework.
For the likes of Hovsmith, this is far too long. He favors the
programming version of Alexander The Great's Gordian
Knot Solution: don't change the software or the programming
methodology; instead, change or optimize your multicore hardware to fit
your sequential, procedural code.
"Although semiconductor platform developers have created innovative
multicore architectures, how easy are these devices to program?" he
asks. The first reaction of most developers of systems based on
multicore devices, said Hovsmith, is to wonder if they can use it in
the context of their legacy software and their software development
environment and methodologies.
"At this point, the real multicore disconnect often becomes
apparent," he said. To deal with it Hovsmith in his class describes a
top down software/hardware development flow starting from regular
software running on a mainstream processor and evolving onto new or
existing multicore systems.
"As most end product differentiation stems from the software, it is
natural to start with that software and work towards the hardware. The
ability to quickly explore and evaluate many architectural and
processing alternatives enables this flow."
This flow, he said, can be used to establish an efficient
architecture in the first place, and can also be used to reprogram the
multicore platform to generate derivative designs in the future.
With this approach, an existing or new piece of software can be
analyzed and the most appropriate application level parallelism
developed. Resource inter-dependencies are removed between application
level functions which may then be executed on different cores. Once
partitioned, the remaining implementation can be made largely
automatic.
"For each of the application specific cores in the system, the user
can tradeoff the key parameters of performance throughput, power
consumption, silicon area and reprogrammability," Hovsmith said, "while
the analysis tools extract the most appropriate levels of instruction
level parallelism. "
Many candidate multicore architectures can be quickly generated, he
said, all of which include the original main processor for which the
software was targeted and one of more custom applications-specific
coprocessor engines.
The user can guide the tools which make up the design flow to find
an optimal point in the design space, considering design time,
performance, power consumption, silicon area and programmability.
The advantage of this approach, said Hovsmith, is that the framework
can be adapted to existing multiprocessor systems. Functional
partitioning, algorithmic changes, and memory and communication network
tuning can be quickly analyzed, with throughput matching done through
algorithmic variations.
"Using coprocessor synthesis tools and system simulation, this
approach hides unnecessary hardware details from the software developer
or system architect," he said, pointing out that the combination of a
simplified programming model and high speed synthesis and simulation
tools encourages architectural exploration to find efficient software
execution across multiple processing resources.
Instead of increasing programming complexity, said Hovsmith, this
approach builds on familiar single processor programming models, with
the same software description retargeted onto different platform
architectures to meet different product implementation requirements.
"The embedded software starting point can be legacy code, reference
code, or newly developed code," he said. "It can be C/C++, assembly
code or some combination. No particular coding style is required, and
no inherent knowledge of processor architectures or hardware design is
needed."
Other classes on embedded multicore and multiprocessor design at the
Embedded Systems Conference in Boston include "Multicore:
affecting the way users design, write and debug embedded software (ST-2)"
taught by Robert Oshana;"Multi-core
software archtecture design (ESC-226)," presented by David
Kalinsky; "Getting
the most out of multi-core processors (ESC-306)," taught by Michael
E. Anderson.
The multicore classes also include: "Fundamentals
of Multi-core development (ESC-406);" from Todd Brian, as well as "Case
studies in software optimization for multi-core SMP (ESC-346),"
presented by Max Domeika; and " Multicore
architectures and programming paradigms (ESC-463)," taught by Anant
Agarwal.
For more resources on this topic on Embedded.com, go to "More about Multicores and Multiprocessors."