By Rishiyur S. Nikhil (Bluespec) and Arvind (MIT)
Efficient Implementation of pH
Ultimately, all applications programmers are concerned with efficiency.
This has two major dimensions. First, there is programmer
productivity-the speed at which we humans can create, debug, maintain
and evolve our programs. And, second, there is the space and time cost
of program execution-how fast it runs and how much memory it takes.
The history of programming languages " and the trend towards higher
level languages - has precisely been the effort to improve programmer
productivity without major sacrifices in program execution efficiency.
As computer systems have become faster and compilers have improved,
program efficiency has decreased as a first-order concern. Today, many
applications are written in "scripting" languages like Perl or Tcl/Tk and in
high-level languages like Java.
These programs perhaps would run much faster if written in C or Fortran,
and probably even faster if written in assembler (with huge effort),
but the premium on programmer productivity is gradually beginning to
dominate the equation. In addition, scripting languages and high-level
languages make programming accessible to a much larger community. For
example, there are many Web site designers who know scripting languages
but no other traditional language.
pH continues this trend of aiming towards higher and higher levels
of programming. Nevertheless, there will always be applications where
program execution efficiency is paramount, where the programmer would
like to extract the absolute maximum performance that a machine is
capable of delivering.
At this "bleeding edge" of performance, very precise control of
resources is necessary. C and Fortran, and even assembler, are
typically the languages of choice in this regime because their
computational model is quite close to the raw machine model, and hence
the programmer is able to control resources quite precisely.
(But even in this milieu, the recent architectural complexities
of out-of-order execution, cache hierarchies, and non-uniform memory
access make the resource/performance model quite complex even for the
expert programmer).
The higher the level of the language, the more difficult it is for
the programmer to have a mental model of resource usage, and the more
difficult it is to compile it to efficient code. This trend began in
the 1960s with Lisp,
and continues today with Java.
Haskell makes resource management even more complex, and pH, with
its implicit parallelism, makes it even more so (although pH does better than Haskell
in that it specifies sharing
precisely and permits shared updatable storage). The following
features make it more and more difficult to implement a language
efficiently:
1) Automatic storage
management
2) Efficient implementation of
polymorphism and overloading
3) Implicit Parallelism (no
explicit threads)
4) Nonstrictness and fine-grain
implicit data synchronization
The first issue is shared with many other programming languages
including Lisp, Scheme,
Haskell, ML,
and Java. The latter two issues are shared with a few other programming
languages such as Haskell and concurrent constraint languages.
Automatic
Storage Management: In modern high-level languages data is
typically allocated in an area of memory called the heap. As the
program executes, more and more of the heap is used up. Eventually,
when the heap is full, the run-time system engages in an activity
called garbage
collection (GC).
GC identifies which data are still accessible to the program and
which data are not and then recycles the memory occupied by
inaccessible data to be available once again for allocation as the
program resumes execution.
The run-time system typically needs to maintain some extra
bookkeeping information with data so that it can identify objects in
the heap and determine whether they are accessible by the program or
not. This bookkeeping activity, known as "tagging", together with the
reclamation process itself, constitutes an overhead for such languages.
A second way in which automatic storage management can slow the
program down is in its interaction with the memory hierarchy. In modern
computers, because of multiple levels of caches and virtual memory,
achieving maximum speed of execution is crucially dependent on
"locality of reference" (i.e., the program must try to maintain as
small an active memory "footprint" as possible).
Allocating data on a heap is usually antithetical to this objective,
and the GC activity itself explores large memory areas without any
locality structure.
The problem of efficient implementation of automatic storage
management is not unique to pH; it also exists in Java, scripting
languages, Lisp, ML, Haskell, and other functional and logic
programming languages. There is a vast body of literature describing
solutions to these problems.
Dynamic techniques include: clever representations of heap objects
that reduce or eliminate tagging overhead; techniques to pack heap
objects to reduce their memory footprint; "generational" garbage
collectors that focus on short-lifetime data, thereby reducing the
number of times that all of memory needs to be scanned for potential
garbage, and so on.
Static techniques include escape analysis, which attempts to
identify data whose lifetime coincides with the call and return of some
procedure. That data can then be allocated within the procedure's stack
frame instead of on the heap and thus the runtime system pays no extra
overhead for allocation and de-allocation.
Lifetime analysis is more general than escape analysis in attempting
to identify exactly when data is allocated and consumed-it can result
in optimizations that pass arguments and results in registers instead
of the heap; that avoid building tuples for multiple arguments and
multiple results; and that aggregate heap allocation of multiple
objects into a single allocation.
Linear Types allow the programmer to ensure that a particular data
structure is passed from operation to operation in a single-threaded
fashion (i.e., that it is never shared). This allows the compiler to
implement functional updates to the data structure, which conceptually
copies the data structure, with imperative in-place updates that do no
copying.