CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Making the transition from sequential to implicit parallel programming: Part 8
Turning parallel Haskell (pH) into a production language



Embedded.com

Efficient Implementation of pH

Ultimately, all applications programmers are concerned with efficiency. This has two major dimensions. First, there is programmer productivity-the speed at which we humans can create, debug, maintain and evolve our programs. And, second, there is the space and time cost of program execution-how fast it runs and how much memory it takes.

The history of programming languages " and the trend towards higher level languages - has precisely been the effort to improve programmer productivity without major sacrifices in program execution efficiency.

As computer systems have become faster and compilers have improved, program efficiency has decreased as a first-order concern. Today, many applications are written in "scripting" languages like Perl or Tcl/Tk and in high-level languages like Java.

These programs perhaps would run much faster if written in C or Fortran, and probably even faster if written in assembler (with huge effort), but the premium on programmer productivity is gradually beginning to dominate the equation. In addition, scripting languages and high-level languages make programming accessible to a much larger community. For example, there are many Web site designers who know scripting languages but no other traditional language.

pH continues this trend of aiming towards higher and higher levels of programming. Nevertheless, there will always be applications where program execution efficiency is paramount, where the programmer would like to extract the absolute maximum performance that a machine is capable of delivering.

At this "bleeding edge" of performance, very precise control of resources is necessary. C and Fortran, and even assembler, are typically the languages of choice in this regime because their computational model is quite close to the raw machine model, and hence the programmer is able to control resources quite precisely.

(But even in this milieu, the recent architectural complexities of out-of-order execution, cache hierarchies, and non-uniform memory access make the resource/performance model quite complex even for the expert programmer).

The higher the level of the language, the more difficult it is for the programmer to have a mental model of resource usage, and the more difficult it is to compile it to efficient code. This trend began in the 1960s with Lisp, and continues today with Java.

Haskell makes resource management even more complex, and pH, with its implicit parallelism, makes it even more so (although pH does better than Haskell in that it specifies sharing precisely and permits shared updatable storage). The following features make it more and more difficult to implement a language efficiently:

1) Automatic storage management
2) Efficient implementation of polymorphism and overloading
3) Implicit Parallelism (no explicit threads)
4) Nonstrictness and fine-grain implicit data synchronization

The first issue is shared with many other programming languages including Lisp, Scheme, Haskell, ML, and Java. The latter two issues are shared with a few other programming languages such as Haskell and concurrent constraint languages.

Automatic Storage Management: In modern high-level languages data is typically allocated in an area of memory called the heap. As the program executes, more and more of the heap is used up. Eventually, when the heap is full, the run-time system engages in an activity called garbage collection (GC).

GC identifies which data are still accessible to the program and which data are not and then recycles the memory occupied by inaccessible data to be available once again for allocation as the program resumes execution.

The run-time system typically needs to maintain some extra bookkeeping information with data so that it can identify objects in the heap and determine whether they are accessible by the program or not. This bookkeeping activity, known as "tagging", together with the reclamation process itself, constitutes an overhead for such languages.

A second way in which automatic storage management can slow the program down is in its interaction with the memory hierarchy. In modern computers, because of multiple levels of caches and virtual memory, achieving maximum speed of execution is crucially dependent on "locality of reference" (i.e., the program must try to maintain as small an active memory "footprint" as possible).

Allocating data on a heap is usually antithetical to this objective, and the GC activity itself explores large memory areas without any locality structure.

The problem of efficient implementation of automatic storage management is not unique to pH; it also exists in Java, scripting languages, Lisp, ML, Haskell, and other functional and logic programming languages. There is a vast body of literature describing solutions to these problems.

Dynamic techniques include: clever representations of heap objects that reduce or eliminate tagging overhead; techniques to pack heap objects to reduce their memory footprint; "generational" garbage collectors that focus on short-lifetime data, thereby reducing the number of times that all of memory needs to be scanned for potential garbage, and so on.

Static techniques include escape analysis, which attempts to identify data whose lifetime coincides with the call and return of some procedure. That data can then be allocated within the procedure's stack frame instead of on the heap and thus the runtime system pays no extra overhead for allocation and de-allocation.

Lifetime analysis is more general than escape analysis in attempting to identify exactly when data is allocated and consumed-it can result in optimizations that pass arguments and results in registers instead of the heap; that avoid building tuples for multiple arguments and multiple results; and that aggregate heap allocation of multiple objects into a single allocation.

Linear Types allow the programmer to ensure that a particular data structure is passed from operation to operation in a single-threaded fashion (i.e., that it is never shared). This allows the compiler to implement functional updates to the data structure, which conceptually copies the data structure, with imperative in-place updates that do no copying.

1 | 2 | 3 | 4 | 5

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :