By Rishiyur S. Nikhil (Bluespec) and Arvind (MIT)
Wouldn't it be great if our programs ran twice as fast merely by
upgrading our workstation, desktop, server, or embedded systems - from
portable media players and smart mobile phones to network switches and
routers - from one up to two processors, or from two up to four?
Wouldn't it be great if doubling the number of processors meant that
it took only half as long to evaluate alternative designs using CAD
tools, to complete a modelling run for tomorrow's weather, or to search
a large database?
We are all very used to the idea of gaining increased performance by
upgrading our workstation with a faster processor, or with a larger or
faster memory system. Unfortunately we do not see this kind of
automatic improvement when we upgrade to a multiprocessor.
Although a multiprocessor often allows us to run more programs
simultaneously, it seldom allows us to complete a given program in less
time, or to run a larger program in a given amount of time, without
extraordinary effort.
The problem is that most application programs are written in
sequential programming languages, and these programs cannot exploit the
extra processors available in a multiprocessor.
We can run separate programs on the different processors to increase
the overall throughput of the system but no single program runs faster
by dividing its work amongst multiple processors, the way a parallel
program could.
Why are most programs sequential?
The answer to this question has a number of
dimensions, one of which is historical - it was only in the late 1990s
that multiprocessors began to be commonly affordable and available as
commodity hardware, and so until recently there was no widespread
interest in writing parallel programs.
But, in addition, most existing approaches to producing parallel
programs are very hard, whether they involve automatically converting
existing sequential programs into parallel ones, or whether they
involve the programmer explicitly writing parallel programs.(The one exception to this are programmers
of operating systems to whom parallel programming has long been of
interest because even on a uniprocessor an operating system juggles a
number of activities "simultaneously".)
In this series of articles we will explore the question of what
makes parallel programming so hard, and we will end with an overview of
the solution that we propose: parallel Haskell (pH). First we
will see
that many programs, when viewed at a suitable level
of algorithmic abstraction, in fact contain abundant parallelism.
(A follow-on series of articles on
Embedded.com will discuss our latest work in this area, based
on "Guarded Atomic Transactions", or "Term Rewriting Systems").
Most of this parallelism is obscured and eliminated by the time we
write the code in a concrete sequential programming language. We will
see why it is very difficult for an automatic system to recover this
obscured parallelism.
If parallelism is not recoverable from our existing sequential
programs, an alternative is to rewrite our programs with parallelism in
mind. Three approaches suggest themselves:
1) We can write our
programs in sequential languages in certain stylized ways and with
certain annotations that make it easy for a compiler to generate
parallel code. The design of these idioms and annotations is influenced
by our insights into what makes automatic parallelization of legacy
sequential programs difficult.
Examples of programming languages in this category are High Performance Fortran [16]
and OpenMP [24]. Unfortunately this
approach is not truly general purpose in that only a certain class of
applications fits this paradigm.
2) We can abandon automatic
parallelization and extend our existing sequential languages with
explicit constructs to initiate parallel activities and to coordinate
their interactions. There are a large number of programming languages
in this category, including POSIX threads (pthreads) [19], Java [8], PVM [14] and MPI [20], Modula [29], Multilisp [15], Cid [21], Cilk [12], Split-C [13], and Linda [4].
3) We can write our programs
in a new programming language where parallelism is implicit. This is
the approach advocated in this series of articles, and pH is such a
language. pH is parallel from the ground up - there is no sequential
core. Its approach is quite the opposite from the previous ones - the
very
first, simplest, programs that a pH programmer writes are parallel, and
it is only later that an advanced pH programmer learns judiciously to
introduce a little explicit sequencing. As such, exploiting multiple
processors is almost trivial in pH.
pH builds on over two decades of experience with dataflow and functional programming
languages. pH is a modern successor to the "Id" dataflow language [10, 23, 22], adopting the
notation and type system of the standard nonstrict pure functional
language Haskell [26] (the name "pH" can
be read as "parallel Haskell" ).
When discussing parallel programming it is important also to keep a
proper perspective on expectations of scaling. Parallel computers today
come in a wide range of sizes, from uniprocessors to thousands of
processors.
It is, in general, unreasonable to expect that a program's speed
will automatically scale across such a diversity of platforms. In very
large machines a processor may experience several orders of magnitude
variation in access times for data that is "near" to the processor
versus "far".
This fundamental change in cost model may require algorithms to be
redesigned: how we map work to different processors, how we place data
in the machine's memory relative to processors, and how we schedule
activities. None of the above approaches to parallel programming can
completely insulate us from these considerations.
However, it is reasonable to expect that scaling should happen
automatically without programmer intervention for the kinds of parallel
machines that are likely to be in the vast majority as commodity
machines - small "SMPs" (Symmetric Multiprocessors), with
perhaps 2 to 16 processors, all having more or less uniform access to a
shared memory (a uniform cost model). At the end of this series we will
have a few more remarks on the effect of architectures on programming.