Making the transition from sequential to implicit parallel programming: Part 1

Rishiyur S. Nikhil (Bluespec) and Arvind (MIT)

August 15, 2007

Rishiyur S. Nikhil (Bluespec) and Arvind (MIT)August 15, 2007

Wouldn't it be great if our programs ran twice as fast merely by upgrading our workstation, desktop, server, or embedded systems - from portable media players and smart mobile phones to network switches and routers  - from one up to two processors, or from two up to four?

Wouldn't it be great if doubling the number of processors meant that it took only half as long to evaluate alternative designs using CAD tools, to complete a modelling run for tomorrow's weather, or to search a large database?

We are all very used to the idea of gaining increased performance by upgrading our workstation with a faster processor, or with a larger or faster memory system. Unfortunately we do not see this kind of automatic improvement when we upgrade to a multiprocessor.

Although a multiprocessor often allows us to run more programs simultaneously, it seldom allows us to complete a given program in less time, or to run a larger program in a given amount of time, without extraordinary effort.

The problem is that most application programs are written in sequential programming languages, and these programs cannot exploit the extra processors available in a multiprocessor.

We can run separate programs on the different processors to increase the overall throughput of the system but no single program runs faster by dividing its work amongst multiple processors, the way a parallel program could.

Why are most programs sequential?
The answer to this question has a number of dimensions, one of which is historical - it was only in the late 1990s that multiprocessors began to be commonly affordable and available as commodity hardware, and so until recently there was no widespread interest in writing parallel programs.

But, in addition, most existing approaches to producing parallel programs are very hard, whether they involve automatically converting existing sequential programs into parallel ones, or whether they involve the programmer explicitly writing parallel programs.(The one exception to this are programmers of operating systems to whom parallel programming has long been of interest because even on a uniprocessor an operating system juggles a number of activities "simultaneously".)

In this series of articles we will explore the question of what makes parallel programming so hard, and we will end with an overview of the solution that we propose: parallel Haskell (pH). First we will see that many programs, when viewed at a suitable level of algorithmic abstraction, in fact contain abundant parallelism.

(A follow-on series of articles on will discuss our latest work in this area, based on "Guarded Atomic Transactions", or "Term Rewriting Systems"). 

Most of this parallelism is obscured and eliminated by the time we write the code in a concrete sequential programming language. We will see why it is very difficult for an automatic system to recover this obscured parallelism.

If parallelism is not recoverable from our existing sequential programs, an alternative is to rewrite our programs with parallelism in mind. Three approaches suggest themselves:

1) We can write our programs in sequential languages in certain stylized ways and with certain annotations that make it easy for a compiler to generate parallel code. The design of these idioms and annotations is influenced by our insights into what makes automatic parallelization of legacy sequential programs difficult.

Examples of programming languages in this category are High Performance Fortran [16] and OpenMP [24]. Unfortunately this approach is not truly general purpose in that only a certain class of applications fits this paradigm.

2) We can abandon automatic parallelization and extend our existing sequential languages with explicit constructs to initiate parallel activities and to coordinate their interactions. There are a large number of programming languages in this category, including POSIX threads (pthreads) [19], Java [8], PVM [14] and MPI [20], Modula [29], Multilisp [15], Cid [21], Cilk [12], Split-C [13], and Linda [4].

3) We can write our programs in a new programming language where parallelism is implicit. This is the approach advocated in this series of articles, and pH is such a language. pH is parallel from the ground up - there is no sequential core. Its approach is quite the opposite from the previous ones - the very first, simplest, programs that a pH programmer writes are parallel, and it is only later that an advanced pH programmer learns judiciously to introduce a little explicit sequencing. As such, exploiting multiple processors is almost trivial in pH. 

pH builds on over two decades of experience with dataflow and functional programming languages. pH is a modern successor to the "Id" dataflow language [10, 23, 22], adopting the notation and type system of the standard nonstrict pure functional language Haskell [26] (the name "pH" can be read as "parallel Haskell" ).

When discussing parallel programming it is important also to keep a proper perspective on expectations of scaling. Parallel computers today come in a wide range of sizes, from uniprocessors to thousands of processors.

It is, in general, unreasonable to expect that a program's speed will automatically scale across such a diversity of platforms. In very large machines a processor may experience several orders of magnitude variation in access times for data that is "near" to the processor versus "far".

This fundamental change in cost model may require algorithms to be redesigned: how we map work to different processors, how we place data in the machine's memory relative to processors, and how we schedule activities. None of the above approaches to parallel programming can completely insulate us from these considerations.

However, it is reasonable to expect that scaling should happen automatically without programmer intervention for the kinds of parallel machines that are likely to be in the vast majority as commodity machines - small "SMPs" (Symmetric Multiprocessors), with perhaps 2 to 16 processors, all having more or less uniform access to a shared memory (a uniform cost model). At the end of this series we will have a few more remarks on the effect of architectures on programming.

< Previous
Page 1 of 2
Next >

Loading comments...