By Rishiyur S. Nikhil (Bluespec) and Arvind (MIT)
The effect of architecture on
programming
While this series is mainly about parallel programming, it is important
to recognize that computer architecture can have a major effect on our
programming languages and on our programming methodology.
For example, early computers (and
even some as recent as the Intel 8086), had small physical
address spaces and no architectural support for virtual memory. On
these machines, it was still possible to write programs that
manipulated huge data structures that would not fit into memory at
once.
In effect, the programmer could mimic a virtual memory system by
manually moving pieces of the array between disk and main memory. This
kind of programming was the domain of experts who understood the
architecture thoroughly.
Eventually, tools came into existence to automate some of this
effort for large-memory programming, using a method called "overlays".
However, these tools were never even close to being able automatically
to translate an arbitrary program that assumed large memory into code
that could work in limited memory.
In other words, the architectural limitation simply could not be
hidden from programmers. Ultimately, it required architectural support
for virtual memory before the situation changed. Finally, everyday
programmers could write programs that were not limited by the
processor's physical address space.
A similar situation exists with parallel processing on large
systems, servers, desktops, network switches and many modern mobile and
portable consumer devices. A universal requirement of parallel
architectures "large, small and smaller - is that, in order to be
physically scalable, they must have distributed memories -the machine's
memory must be divided into several independent modules, and there must
be many independent paths from processors to memory modules.
The first generation of commercially viable large parallel computers
achieved this by simply interconnecting large numbers of conventional
sequential computers; each node of the machine consists of a
conventional processor with memory and a means to communicate with
other nodes.
These multicomputers can be programmed with message-passing, which
exactly mirrors the architectural organization. But we have seen how
difficult it is to program using the message-passing model; so, while
extremely useful and effective programs have been (and continue to be) written this
way, it remains the domain of experts.
Just as yesterday's large memory programming was only feasible for
those who understood thoroughly how to manage movement of data, between
memory and disk, today's parallel programming with message-passing is
only feasible for those who understand thoroughly how to manage
movement of data between processing nodes of a multicomputer.
HPF (High Performance Fortran)
can be seen as an attempt to automate some of this. But, just as with
yesterday's tools for overlays in small-memory machines, today's HPF
programmer has to be sophisticated about data distributions, and in any
case these tools can't handle arbitrary, general-purpose programs.
It is our belief that shared-memory parallel programming today is
the analog of virtual-memory programs of yesterday-it is the only model
feasible for general-purpose programs written by ordinary programmers
(and hence for widespread use).
Further, it will not become viable without architectural support.
Even though the virtual memory illusion is mostly implemented in
operating system software, it required some architectural support
before it became efficient enough that the average programmer no longer
had to think about it (translation lookaside buffers, page faults).
Similarly, even though the shared-memory illusion may be implemented
mostly in run-time system software, some architectural support is
needed to make it efficient enough that the average parallel programmer
no longer needs to think about it.
(This is not to imply that that
message-passing must be completely hidden from the programmer carefully
hand-crafted message-passing algorithms are still likely to be
advantageous; we simply mean that, by and large, the programmer would
prefer a shared-memory programming model).
To date, there is no consensus on exactly what architectural support
is necessary to support the shared-memory illusion on a distributed
memory parallel machine (even though there are some commercial products
that choose particular approaches).
The question is further complicated because the required
architectural support depends on what parallel programming models one
wishes to support, and there is no consensus on that, either (the
situation was easier with virtual memory-at least the user-level
programming model was not an issue). Architectural support for
distributed shared memory is a hot topic of research in many academic,
industrial and government groups all over the world.