The answer to the first question is yes - such languages are called declarative languages - but the answer to the second question is no - some programming tasks are difficult, if not impossible, to express declaratively.
The efficient implementation of pure declarative languages is also very difficult (curiously, the implementation problem is the dual of the automatic parallelization problem, an observation that we will return to shortly). Nevertheless, declarative languages are an excellent basis for more complete parallel programming languages and, indeed pH, the language advocated in this series of articles, is based on this idea.
A well-known class of declarative languages is that of functional programming languages, such as Lisp, SML and Haskell [26]. Hughes provides an excellent articulation of the advantages of functional programming in "Why Functional Programming Matters" [18]. A second well-known class is that of logic programming languages, such as Prolog and KL1 and their more modern derivatives, constraint logic programming languages, such as Oz . We will mainly discuss functional languages here (because of their relationship to pH), but many of our remarks also apply to the latter.
As an aside, we note that the adjective "declarative" is subjective. Intuitively it means that the program describes what is to be computed rather than how to compute it. Such a description is "higher level", or more "abstract" and hence presumably easier for humans to write and comprehend.
However, no programming language can ever get away completely from the how of problem solving. The most abstract descriptions of computations may not even be executable (such descriptions are the purest form of specifications).
For example, an abstract description of the square root function is "the function which, for argument n, produces x such that x2 = n." An abstract description of a sorting function is "the function which, for argument sequence xs, produces the sequence ys which is a permutation of xs and in which y(i) < y(i+l)."
It is unreasonable to expect any interpreter/compiler to actually execute these specifications systematically and efficiently. Thus, all programming, no matter how high level, fundamentally concerns algorithms, which specify how to compute an answer. So, the adjective "declarative" is mainly an assessment of how much non-problem-specific detail can be suppressed in a programming language.
At one extreme (least declarative) is architecture-specific assembler, where the programmer is intimately concerned with managing storage layout, movement of data between registers and memory, and sequencing.
Most architecture-independent programming languages such as C, Pascal, and Fortran abstract away from some storage-layout details and provide higher-level control structures such as if-then-elses, loops, and recursive procedures.
Some object oriented languages like Java provide even higher-level structures such as data abstraction and inheritance and further relieve the programmer from storage-management details by managing storage deallocation automatically.
Declarative languages like functional and logic programming languages abstract away from storage layout and storage management details completely and also abstract away from many sequencing details.
Multiple definitions of a variable are simply not allowed and all computations are expressed in terms of computing new values from old. For example, given A, a vector of numbers, suppose we want to scale it by 5; that is, the result should be a vector whose i'th component should be 5A(i). In Fortran, we might say:
do i = 1,N
A(i) = 5 * A(i)
end do
and A itself contains the result. Further, the old value of A is no longer available. In functional languages, such a concept does not exist; instead, we define a new vector based on the old. For example, in pH
b = array (1,n)
[(i, 5 * a!i) | i <- [1..n]]
(In Haskell, array indexing is denoted by the infix "!" operator.) This defines b, a new vector with index range 1 to n, such that b(i) = 5a(i). Nothing has happened to a-it still means exactly what it used to mean. For example, we could now produce a third vector c that is the vector-sum of a and b:
c = array (1,n)
[(i, a!i +b!i) | i <- [1..n]]
Matrix multiplication may be expressed in pH as follows:
matmult a b =
matrix ((1,1),
(n,n))
[((i,j), ip A i B j) | i <-
[1..n], j <- [1..n]]
ip A i B j =
let
s = a!(i,l) * b!(l,j)
in
for k <- [2..n] do
next s = s + a!(i,k) * b!(k,j)
finally s
The first part defines the function matmult, which takes two arguments-the matrices a and b. It produces a new n x n matrix whose (i,j)'th component is the inner product of the i'th row of a with the j'th column of b.
(pH permits us to extract size information from a matrix, but since this representation choice is orthogonal to whether a language is functional or not, and since we want to keep the example simple, we assume that a and b are n x n matrices.)
The second part defines the inner product function ip. The let statement binds the value of a(i,1) x b(1,j) to a variable s. Although the loop may superficially look like the corresponding Fortran loop, it is fundamentally different-it expresses the idea of n separate and concurrent computations:
sl = a!(i,j) * b!(1,j)
s2 = s1 + a!(i,2) * b!(2,j)
s3 = s2 + a!(i,3) * b!(3,j)
sn = s(n-1) + a!(i,n) * b!(n,j)
The first computation is the one outside the loop (the let-binding), and sn, the final s, is the value of the entire loop. In other words, there are n different values of s; the next notation simply relates each sk to its corresponding s(k-1); and the finally term just specifies which s is returned as the result value of the whole loop (and, in turn, the function).
The statements of a loop are all executed concurrently, limited only by data dependencies (although the data dependencies in this particular loop do not admit much parallelism other than doing the multiplications in parallel). Thus, it is not the case that s is a variable that is being updated, as in the Fortran loop-each variable still has a unique definition.
The pH program for matrix multiplication directly expresses the structures in original figures at the start of this series of articles, illustrating the mathematical definition. In particular, it takes no position on the order in which computations are done, beyond the natural ordering of data dependencies.
Since variables never change, there cannot ever be any antidependencies. It is for this reason that functional languages are considered to be "naturally parallel" languages.
Another kind of parallelism that is very important is producer-consumer parallelism, and this arises naturally in functional languages with nonstrict data structures. Suppose, instead of scaling vector a by 5, we have a more complicated mapping function f:
b = array (1,n)
[ (i, f (A!i)) | i <- [1.. n]]
The computation of all elements of b may be done in parallel, but they may take widely differing times. In languages with strict data structures (most existing programming languages), a consumer of b must wait until all components have been defined (i.e., until the slowest component computation has completed) because b is simply not available to the consumer until it is fully defined.
In languages like pH with nonstrict data structures, on the other hand, a consumer may be given a reference to b as soon as it is allocated and may attempt to read its components immediately.
These reads automatically wait as long as the components are empty and automatically resume as they become filled. Thus, the consumer can overlap some of its work with the computation of b's components.
For a more dramatic example of producer-consumer parallelism, suppose we have one computation that produces a list of sample readings (perhaps obtaining them from some measuring instrument) and another computation that analyzes these readings looking for interesting events. In pH,
producer instrument =
if no-more-readings(instrument)
then []
else
let
x = get_reading(instrument)
in
x : produce (instrument)
analyzer [] = []
analyzer (x:xs) = if
is_interesting(x)
then x : analyzer(xs)
else analyzer(xs)
The notation "x:xs" is pronounced "x cons xs" and represents a list with x in front, followed by the remaining list, xs. The notation [] represents an empty list. The producer function gets a reading x from the instrument and produces a list containing x followed by more readings, until the instrument indicates that there are no more readings.
The analyzer function handles two cases, and this is indicated by matching the actual argument against one of the two patterns [] and (x:xs). In the former case, it returns an empty list of interesting readings.
In the latter case, it checks if the first reading x is interesting. If so, it returns it, followed by the result of analyzing the remaining readings xs. Otherwise, it discards x and simply returns the result of analyzing the remaining readings.
Now, the interesting producer-consumer parallelism occurs when we plug these functions together:
events = analyzer (producer (a_particular_instrument))
Suppose the instrument yielded a reading every ten seconds, there are a hundred readings, and it takes five seconds to analyze each reading. With conventional ("strict") data structures, the producer function would, after 1000 seconds or so, return a list of 100 elements.
At this point, the analyzer would begin its work and, about 500 seconds later, eventually return a list of interesting readings. Overall, the computation takes 1500 seconds or more.
With nonstrict data structures, however, as soon as the instrument has yielded its first reading, the producer can return the first element of the list, leaving the "tail" of the list temporarily empty, to be filled in later when it gets the next reading.
Having returned this first reading, the analyzer can examine it immediately. If it is interesting, the analyzer, in turn, can immediately return the first element of its result list, leaving its tail empty, to be filled in later.
There is an implicit synchronization necessary here. In particular, the analyzer, having examined the first reading, may attempt to proceed to the next reading, which may not be available yet.
The semantics of nonstrict data structures specify that the analyzer automatically blocks until the tail of its input list is no longer empty (i.e., has been filled in by the producer).
One can visualize this as a chain that is gradually produced by the producer and incrementally consumed by the analyzer as it becomes available. The work of the analyzer and producer can thus be overlapped, instead of being run seriatim. Thus, each 5-second activity of the analyzer can take place during the 10-second wait for the next instrument reading, and the overall computation can be completed in about 1000 seconds.
Although we have illustrated producer-consumer parallelism with a simple, linear data structure (a list), the same principle applies to arbitrary data structures in pH and nonstrict functional languages.
For example, the producer may organize the instrument readings in a sorted tree. The picture changes to one where the fringe, or frontier, of the tree grows incrementally as readings arrive. The analyzer, in parallel, pursues all branches up to the current fringe, consuming values at the fringe as it moves outward.
Similarly, the production of different elements of an array may take varied times, and a consumer of the array can simultaneously pick up those values that are ready at any given point in time.
Next in Part 6: So, why aren't we all using functional languages yet?Rishiyur S. Nikhil and Arvind are the coauthors of Implicit Parallel Programming in pH, published by Elsevier/Morgan Kaufmann.
Rishiyur is Chief Technical Officer at ESL specialist Bluespec Inc. Previous to its acquisition by Broadcom he led the Bluespec technology team at Sandburst Corp. For the previous 9 years he was at Cambridge Research Laboratory (DEC/Compaq), including over a year as Acting Director. Earlier, he was an Associate Professor of Computer Science and Engineering at MIT. He holds several patents in functional programming, dataflow and multithreaded architectures, parallel processing and compiling.
Arvind is the Johnson Professor of
Computer Science and Engineering at the Massachusetts Institute of
Technology. His work laid the foundations for Bluespec, centering on
high-level specification and description of architectures and protocols
using Term Rewriting Systems (TRSs),
encompassing hardware synthesis as well as verification of
implementations against TRS specifications. Previously, he contributed
to the development of dynamic dataflow architectures, the implicitly
parallel programming languages Id and pH, and compilation of these
languages on parallel machines. He was also president and founder of
Sandburst Corp.
To
read more on this topic on Embedded.com go to More
about multicores, multithreading and multiprocessing.
References:
[1] "Microsoft:
Parallel Programming Model Ten Years off," by Rick Merritt,
Embedded.com, July 23, 2007
[2] "The Problem with Threads" by
Edward A. Lee, IEEE Computer, 39:5, May 2006, pp 33-42
[3] "Why
threads are a bad idea (for most purposes)," J.K. Ousterhout,
Invited Talk, USENIX Technical Conference, January, 1996
[4] S. Ahuja, N. Carriero, and
D. Gelernter. "Linda
and Friends," IEEE Computer, 19(8):26-34, August 1986.
[5] Z. Ariola and Arvind. "Properties
of a First-order Functional Language with Sharing." Theoretical
Computer Science, 146(1-2):69-108, July 1995.
[6] J. Armstrong. "The Development of
Erlang." In Proc. Intl. Conf. on Functional Programming (ICFP),
Amsterdam, The Netherlands, pages 196-203, 1997.
[7] J. Armstrong, R. Virding,
and M. Williams. "Concurrent
Programming in Erlang." Prentice Hall, 1993. ISBN: 0-13-285792-8.
[8] K. Arnold and J. Gosling. The Java
Programming Language, second edition. Addison-Wesley, 1998. ISBN
0-201-31006-6.
[9] Arvind, A. Caro, J.-W.
Maessen, and S. Aditya. A
Multithreaded Substrate and Compilation Model for the Implicitly
Parallel Language pH. In Proc. Wkshp. on Languages and Compilers
for Parallel Computing (LCPC), August 1996.
[10] Arvind, K. P. Gostelow,
and W. Plouffe. The
(preliminary) Id Report. Technical Report 114, Dept. of Information
and Computer Science, Univ. of California, Irvine CA, USA, 1978.
[11] Arvind and R. S. Nikhil. Executing
a Program on the MIT Tagged-Token Dataflow Architecture. IEEE
Trans. on Computers, 39(3):300-318, March 1990.
[12] R. D. Blumofe, C. F.
Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime
System. In Proc. 5th. ACM Symp. on Principles and Practice
of Parallel Programming (PPOPP), Santa Barbara CA, USA, pages 207-216,
July 19-21 1995.
[13] D. E. Culler, A. Dusseau,
S. C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K.
Yelick. Parallel
Programming in Split-C. In Proc. Supercomputing 93, Portland OR,
USA, November 1993.
[14] A. Geist, A. Begeulin, J.
Dongarra, W. Jiang, R. Manchek, and V. S. Sundaram. PVM
Parallel Virtual Machine. A User's Guide and Tutorial for Network
Parallel Computing. MIT Press, Cambridge MA, USA, 1994.
[15] R. H. Halstead. Multilisp:
A
Language for Concurrent Symbolic Computation. ACM Trans. on
Programming Languages and Systems (TOPLAS), 7(4):501-539, October 1985.
[16] High Performance
Fortran Forum. High Performance Fortran Language Specification,
Version 1.0. Technical Report CRPC-TR92225, Center for Research on
Parallel Computation, Rice University, Houston TX, USA, May 3 1993.
[17] C. Hoare. Monitors:
an Operating System Structuring Concept. Communications of the ACM,
17(10):549-557, October 1974.
[18] R. Hughes. Why
Functional Programming Matters. The Computer Journal, 32(2):98-107,
1989.
[19] Threads
standard POSIX 1003.1c-1995 (also ISO/IEC 9945-1:1996), 1996.
[20] Message Passing Interface
Forum. MPI:
A Message-Passing Interface Standard, May 1994.
[21] R. S. Nikhil. "Cid: A Parallel
'Shared-memory' C for Distributed Memory Machines." In Proc.
7th. Ann. Wkshp. on Languages and Compilers for Parallel Computing
(LCPC), Ithaca NY, USA (Springer Verlag LNCS 892), pages 376-390,
August 8-10 1994.
[22] R. S. Nikhil and Arvind. Id:
a Language with Implicit Parallelism. In A Comparative Study of
Parallel Programming Languages: The Salishan Problems, John Feo
(editor), pages 169-215. North Holland, 1992.
[23] R. S. Nikhil, K. Pingali,
and Arvind. Id Nouveau. Technical Report CSG Memo 265, Computation
Structures Group, MIT Lab. for Computer Science, Cambridge MA 02139,
USA, July 1986.
[24] OpenMP Architecture Review
Board. OpenMP
Specifications, 1997 (Fortran), 1998 (C/C++)
[25] S. L. Peyton Jones.
The Implementation of Functional
Programming Languages. Prentice Hall, 1987.
[26] S. L. Peyton Jones
(editor), J. Hughes (editor), L. Augustsson, D. Barton, B. Boutel, W.
Burton, J. Fasel, K. Hammond, R. Hinze, P. Hudak, T. Johnsson, M.
Jones, J. Launchbury, E. Meijer, J. Peterson, A. Reid, C. Runciman, and
P. Wadler. Haskell
98: A Non-strict, Purely Functional Language, February 1999.
[27] M. C. Rinard and P. C.
Diniz. Commutativity
Analysis: A New Analysis Framework for Parallelizing Compilers. ACM
Trans. on Programming Languages and Systems (TOPLAS), 19(6):942-991,
November 1997.
[28] M. C. Rinard, D. J.
Scales, and M. S. Lam. Jade: A
High-Level, MachineIndependent Language for Parallel Programming.
IEEE Computer, 26(6):28-38, June 1993.
[29] N. Wirth.
Programming in Modula-2. Springer Verlag, Berlin, Germany, 1982.