Making the transition from sequential to implicit parallel programming: Part 6 - Embedded.com

Making the transition from sequential to implicit parallel programming: Part 6

The answer to why ” despite their strengths – functional programminglanguages are not being widely used in programming complexmultiprocessor designs is simple: they are not adequate to express allprogramming tasks. In part, this has to do with their inability toexpress nondeterministic computations; in part, some programs aresimpler when expressed with state and, in part, it has to do withefficiency.

Some computing tasks are inherently nondeterministic. For example,when we benchmark programs, we expect that the timer readings takenbefore and after the computation will change, depending on the machineconfiguration, load, quality of compiled code, and so on.

Some computations, such as communication protocols, use explicittime-outs in a fundamental way. Storage allocators, such as the onethat finds a free disk block when we write to a file, generally do notcare which disk block is returned for use. None of these can beexpressed in a functional language because the values returned by theseoperations are fundamentally not functions of inputs.

Although the nondeterminism of parallel access to shared state isusually an obstacle to parallelism, there are a number of situationswhere this nondeterminism may be exploited usefully to enhanceparallelism and to enhance the clarity of the program.

An example is the nondeterministic summation discussed earlier inthe inner product of matrix multiplication. It is a specific case of”accumulation” programs where the order of accumulation does notmatter. Computing histograms is another example: the order in whichsamples are tallied in a histogram does not matter.

Another useful class of problems is nondeterministic search: forexample, a robot motion-planning program may evaluate severalalternative strategies in parallel and choose the best one it can findwithin a given time limit-there is no a priori way to predict whichalternative will be chosen.

Some programs are simpler (and,counter-intuitively, more parallel! ) when they use stateexplicitly. An example is graph traversal. Suppose we were exploring amaze and we have to find a path from the entrance to the exit. Supposewe had an army of lieutenants and we sent them down different paths inparallel.

In order to prevent duplication of work (and to avoid going round in circles),each lieutenant is provided with a stick of chalk, and he marks everycorner that he visits. When he arrives at a marked corner he retreats,knowing that some other lieutenant (oreven himself, if he walked in a circle ), has already been there.

Unfortunately, there is no way of expressing this intuitivealgorithm in a pure functional language. Every corner has state-it iseither marked or unmarked-and this cannot be expressed functionally.Instead, the functional program looks like this: we have only onelieutenant, who carries a log book in which he records every cornerthat he visits.

More precisely, at every visited corner, he constructs a new logbook which differs from the old log book by one additional entry-theentry for that corner. At each corner, he consults his log book tocheck whether he has been there before.

Not only is this solution totally sequential (just one lieutenant), it ispotentially expensive (constructingand consulting log books ), and it is also much less intuitivethan the solution that uses state.

There is simply no way to express any of these nondeterministicprograms in a functional language. This is true by definition; in fact,functional languages are so named because programs behave likemathematical functions-outputs are uniquely determined by inputs.Functional language inefficiencies
Functional languages are generally less efficient than imperativelanguages. Much of this inefficiency comes from two factors: storagemanagement overhead and unnecessary copying. At the beginning of theprevious part in this series (Part 5 ),we showed both Fortran and pH code to scale a vector by 5.

The Fortran codesimply overwrote the existing array with the new contents, whereas thepH version created an entirely new vector; that is, it has to pay theextra cost of allocating a new vector.

And, since computers have finite storage and functional languagescontinually allocate new storage for new objects, there has to be acorresponding deallocation process (usuallycalled “garbage collection” ) that recycles storage for objectsthat are no longer necessary in the computation-this also adds tostorage management overhead.

Java is an example of a newer imperative language where thedesigners have appreciated the value of garbage collection (automatic storage management) andincorporated it as as standard part of the language.

Unnecessary copying occurs whenever we compute a new data structurefrom an old one, in which many of the component values are unchanged.For example, suppose we want to scale only the seventh component ofvector A by 5. In Fortran, we could say:

       A(7) = 5 * A(7)

and A contains the result. In a functional language we must, asalways, define a new array for the result:

       b = array (1,n)                       [ (1, if i==7
                      then 5 * a!i
                      else a!i ) | i <- [i..n] ]

The new array b is mostly a copy of a, except at index 7. InFortran, the operation performs a fixed (constant) amount of work (one read, one multiplication, one write),whereas the functional program performs work proportional to the sizeof the array (N reads, 1multiplication and N writes ).

Synchronization overheads
Nonstrict languages have additional “synchronization” overheads. First,when a data structure is allocated, the implementation has toinitialize all the components of the new structure to be “empty”,because it cannot predict whether or not a consumer will attempt toread a component before it is filled (thefilling computation may be arbitrarily complicated and, moreover, thefilling computations for some components may depend on the values inother components ).

Second, when a computation attempts to read a data structurecomponent, it has to check whether the data is available yet and, ifnot, it has to be suspended and resumed when the data does becomeavailable.

To illustrate why it is difficult for a compiler to eliminate thisoverhead, consider the following rather innocent-looking functiondefinition which takes two arguments and returns two results:

    f    (x,y) = (x+5, y+10)

It appears straightforward to generate efficient code forthis-simply add 5 and 10 to the first and second arguments,respectively, and return them. However, consider the following twocontexts in which f may be called:

    (c,d)   =   f (23, c)        — (A)
    (c, d)   =    f (d , 23)     — (B)

In (A), when f is called, we know x=23; it can compute and returnx+5. This defines c=28, at which point f's second argument is known,and f can finish its work, computing and returning y+10 as its secondresult; this defines d=38. On the other hand, in (B), when f is called,we know y=23; it can compute and return y+10.

This defines d=33, at which point f's first argument is known, and fcan finish its work, computing and returning x+5 as its first result;this defines c=38. In other words, a compiler cannot even schedule theorder in which x+5 and y+10 are to be computed; instead, it has toproduce code that allows this to be determined dynamically. This addsadditional overhead to the execution.

Efficiency is a slippery issue
The efficiency issue is quite slippery because it is tied toparallelism. First, in situations where we do need both the old and thenew values of the array for parallel access, or where we do needproducer-consumer parallelism and so on, the Fortran programmer wouldhave to replicate, manually, what the functional programmer doesnaturally; here, it is possible that the functional language compilerproduces a more efficient implementation than storage management andsynchronization code written manually by the Fortran programmer.

Second, a smart functional language compiler may recognize that thearray A is no longer needed after the above scaling computation, and soit may choose to reuse A's storage for B and do a single update atindex 7, just like the Fortran code. This not only eliminates thestorage management overhead, it also eliminates the unnecessarycopying.

Interestingly, the problem of determining when storage can be reusedsafely by the system also requires exact information about when twoexpressions refer to the same object. It is exactly the same aliasingproblem that makes automatic parallelization of sequential languagesdifficult – this is the duality that we mentioned earlier.

The analysis is the same, but the solutions are duals of each otherin the sense that the automatic parallelizer, to be conservative, mustreuse storage and avoid parallelism, whereas the functional languagecompiler, to be conservative, must not reuse storage and must avoidsequentiality.

This is why, in imperative languages, it is easyto write sequential programs and difficult to write parallel oneswhereas in functionallanguages, it is easy to write highly parallel programs anddifficult to write sequential ones.

In fact, one might observe that sequentializing a nonstrictfunctional program is as hard a problem (some would say, as intractable) asparallelizing a sequential imperative program.Prominent Functional Languages
There are three major families of functional languages: the Lisp family(including Common Lisp and Scheme), theML family (including SML), andthe lazy functional languages family, including Haskell [26]. These families differprimarily in their choice of static versus dynamic type checking,whether or not they are pure-that is, without side-effecting operations(state)-and whether they have strict or nonstrict semantics.

Lisp is dynamically typed, whereas SML and Haskell are staticallytyped and have a rich data type system based on the Hindley-Milner type system.These choices are orthogonal to issues of parallelism; type systemchoices are usually made on other considerations such as expressivepower, program robustness and software engineering. pH, which we willbe discussing in the next part in this series, has a static type systemsimilar to Haskell's.

While it is possible to do pure functional programming in Scheme andSML, both are impure functional languages; that is, they both haveside-effecting operations and a notion of state, motivated byefficiency and expressivity concerns such as those discussed in theprevious section. Haskell, on the other hand, is a pure functionallanguage with no side-effecting operations whatsoever.

Being a pure functional language, Haskell has an additionalefficiency concern beyond those that it shares with Lisp and SML–theoverhead of allocating and copying data structures as we “update” them.Researchers in this community have worked hard to address this.

The key insight is that, when we “update” a functional datastructure, if there are no other references to its old value, then theimplementation can safely perform the update in situ and avoid copyingthe data structure.

There are several ways to achieve this. At run time, we couldmaintain a reference count for each data structure, which is an integercounter associated with each data structure that counts the number ofreferences to it.

During an “update” operation, if the implementation sees that thisvalue is l, it can perform the update in place, on the old datastructure. Alternatively (or inaddition to dynamic reference counts ) we can statically analyzethe program to prove that the reference count during a particularupdate will be l; if so, we can generate code that directly performsthe update in place and avoids the reference count check entirely.

A further alternative is to invent a type system that guaranteesthat certain things have unit reference counts-various systems called”linear types”, “unique types”, etc. have been proposed and prototyped.In recent years, a system called monads has gained currency in theHaskell community to address this issue.

Briefly, monads provide a way to package up a sequence of computations on a datastructure in such a way that, by construction, there is always exactlyone reference to the data structure.

In essence, the data structure is “single-threaded” through thecomputation. This allows all functional “updates” on that datastructure safely to be implemented with in-place updates, avoidingcopying. Of course, this does not at all address the issue of parallelupdates to a data structure.The issue of strictness
The final dimension along which these functional languages differ isstrictness: Lisp and SML are strictlanguages, whereas Haskell is nonstrict. This choice is notentirely independent of the choice of purity.

Raw (unsynchronized) side effects usually need a sequencing contextto be meaningful, and strictness imposes more sequencing constraintsthan nonstrictness. SML's language definition specifies sequencingtotally and precisely; it is strict by implication because thesequencing specifies that arguments are evaluated before performingfunction calls, and data structure components are evaluated before thedata structure itself is constructed.

Common Lisp's language definition also specifies sequencingcompletely, and it is also strict by implication. Scheme, in the Lispfamily, is also strict, even though it leaves some sequencing questionsopen (such as the order of evaluationof arguments ).

Haskell is a nonstrict language, and most implementations use lazyevaluation to implement this. Lazyevaluation is a totally sequential evaluation mechanismthatworks backward from the final result expression to determine exactlywhat needs to be computed in order to realize the final result, andcomputes it.

Until recently functional languages have made little headway in thewider professional software development programming community,primarily because of certain perceptions about their inefficiency.

Even though Lisp is as old as Fortran, many features of functionallanguages have long been regarded as having an unacceptable run-timecost; these features include heap-allocated storage with automaticgarbage collection and complete type safety (including safe pointers and arraybounds-checking ).

Somehow, arguments about increased programmer productivity, morerobust programs, simpler and cleaner coding, and so on were notsufficient to persuade many professionals or organizations to adoptfunctional languages.

Dismantling the functional languagebarrier
The advent of Java has done much to dismantle this barrier. Java wasnot originally promoted as a universal application programminglanguage-its main attractions were portable, secure Internetprogramming with the support of byte-code interpretation and dynamicloading.

Nevertheless, its success has led to a broadening of its scope tothe point where many projects are now choosing Java as theirimplementation language, whether related to portable Internetprogramming or not. In the process, many people have for the first timedirectly experienced and realized the benefits of automatic garbagecollection and complete type safety.

It is possible that this revolution will also make people morereceptive to functional programming languages (indeed many functionalprogramming languages have implementations that are more efficient thancurrent Java implementations).

An example of this change in thinking is the telephone equipmentcompany L.M.Ericsson of Sweden which has begun to use a simpleconcurrent functional language called Erlang for new telephone switch codes after finding it difficult to producerobust software with C++ [7] [6] .

Figure1.11. Layers of parallel Haskell(pH)

Is parallel Haskell (pH) the answer?
We designed pH as a parallel dialect of Haskell. It is primarily afunctional language because we believe that functional languages arethe best starting point for parallel programming languages.

Ultimately, we need massive parallelism to exploit all the varietiesof parallelism in real machines, and functional languages allow us toexpress vast amounts of parallelism implicitly and effortlessly.

Moreover, this parallelism is not restricted to dense rectangulararrays and loops, but to arbitrary recursive and loop computations onarbitrary linked data structures.

However, recognizing the limitations of pure functional programmingdescribed above, pH also provides constructs for explicit expressionand manipulation of state. It is convenient to view pH as a languagewith three layers, as depicted in Figure1.11 above . We will discuss each of these layers next in Part 7: pH – an implicitlyparallel, declarative language.

To read Part 1, go to How sequential languages obscureparallelism.
To read Part 2, go to How to achieve parallel execution?
To read Part 3, go to Explicit Parallelism: Shared-MemoryProgramming with Threads and Locks
To read Part 4, go to Explicit Parallelism:Message-Passing programming
To read Part 5, go to Implicitparallelism: Declarative programming languages

RishiyurS. Nikhil and Arvind are the coauthors of ImplicitParallel Programming in pH, published by Elsevier/MorganKaufmann.

Rishiyur is Chief TechnicalOfficer at ESL specialist Bluespec Inc . Previous to itsacquisition by Broadcom he led the Bluespec technology team atSandburst Corp. For the previous 9 years he was at Cambridge ResearchLaboratory (DEC/Compaq), including over a year as Acting Director.Earlier, he was an Associate Professor of Computer Science andEngineering at MIT. He holds several patents in functional programming,dataflow and multithreaded architectures, parallel processing andcompiling.

Arvind is the Johnson Professor ofComputer Science and Engineering at the Massachusetts Institute ofTechnology. His work laid the foundations for Bluespec, centering onhigh-level specification and description of architectures and protocolsusing Term Rewriting Systems (TRSs),encompassing hardware synthesis as well as verification ofimplementations against TRS specifications. Previously, he contributedto the development of dynamic dataflow architectures, the implicitlyparallel programming languages Id and pH, and compilation of theselanguages on parallel machines. He was also president and founder ofSandburst Corp.

Toread more on this topic on Embedded.com go to Moreabout multicores, multithreading and multiprocessing.

References:
[1]Microsoft:Parallel Programming Model Ten Years off,” by Rick Merritt,Embedded.com, July 23, 2007
[2]The Problem with Threads” byEdward A. Lee, IEEE Computer, 39:5, May 2006, pp 33-42
[3]Whythreads are a bad idea (for most purposes),” J.K. Ousterhout,Invited Talk, USENIX Technical Conference, January, 1996
[4] S. Ahuja, N. Carriero, andD. Gelernter. “Lindaand Friends,” IEEE Computer, 19(8):26-34, August 1986.
[5] Z. Ariola and Arvind. “Propertiesof a First-order Functional Language with Sharing.” TheoreticalComputer Science, 146(1-2):69-108, July 1995.
[6] J. Armstrong. “The Development ofErlang.” In Proc. Intl. Conf. on Functional Programming (ICFP),Amsterdam, The Netherlands, pages 196-203, 1997.
[7] J. Armstrong, R. Virding,and M. Williams. “ConcurrentProgramming in Erlang.” Prentice Hall, 1993. ISBN: 0-13-285792-8.
[8] K. Arnold and J. Gosling. The JavaProgramming Language, second edition. Addison-Wesley, 1998. ISBN0-201-31006-6.
[9] Arvind, A. Caro, J.-W.Maessen, and S. Aditya. AMultithreaded Substrate and Compilation Model for the ImplicitlyParallel Language pH. In Proc. Wkshp. on Languages and Compilersfor Parallel Computing (LCPC), August 1996.
[10] Arvind, K. P. Gostelow,and W. Plouffe. The(preliminary) Id Report. Technical Report 114, Dept. of Informationand Computer Science, Univ. of California, Irvine CA, USA, 1978.
[11] Arvind and R. S. Nikhil. Executinga Program on the MIT Tagged-Token Dataflow Architecture. IEEETrans. on Computers, 39(3):300-318, March 1990.
[12] R. D. Blumofe, C. F.Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded RuntimeSystem. In Proc. 5th. ACM Symp. on Principles and Practiceof Parallel Programming (PPOPP), Santa Barbara CA, USA, pages 207-216,July 19-21 1995.
[13] D. E. Culler, A. Dusseau,S. C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K.Yelick. ParallelProgramming in Split-C. In Proc. Supercomputing 93, Portland OR,USA, November 1993.
[14] A. Geist, A. Begeulin, J.Dongarra, W. Jiang, R. Manchek, and V. S. Sundaram. PVMParallel Virtual Machine. A User's Guide and Tutorial for NetworkParallel Computing. MIT Press, Cambridge MA, USA, 1994.
[15] R. H. Halstead. Multilisp:ALanguage for Concurrent Symbolic Computation. ACM Trans. onProgramming Languages and Systems (TOPLAS), 7(4):501-539, October 1985.
[16] High PerformanceFortran Forum. High Performance Fortran Language Specification,Version 1.0. Technical Report CRPC-TR92225, Center for Research onParallel Computation, Rice University, Houston TX, USA, May 3 1993.
[17] C. Hoare. Monitors:an Operating System Structuring Concept. Communications of the ACM,17(10):549-557, October 1974.
[18] R. Hughes. WhyFunctional Programming Matters. The Computer Journal, 32(2):98-107,1989.
[19] Threadsstandard POSIX 1003.1c-1995 (also ISO/IEC 9945-1:1996), 1996.
[20] Message Passing InterfaceForum. MPI:A Message-Passing Interface Standard, May 1994.
[21] R. S. Nikhil. “Cid: A Parallel'Shared-memory' C for Distributed Memory Machines.”  In Proc.7th. Ann. Wkshp. on Languages and Compilers for Parallel Computing(LCPC), Ithaca NY, USA (Springer Verlag LNCS 892), pages 376-390,August 8-10 1994.
[22] R. S. Nikhil and Arvind. Id:a Language with Implicit Parallelism. In A Comparative Study ofParallel Programming Languages: The Salishan Problems, John Feo(editor), pages 169-215. North Holland, 1992.
[23] R. S. Nikhil, K. Pingali,and Arvind. Id Nouveau. Technical Report CSG Memo 265, ComputationStructures Group, MIT Lab. for Computer Science, Cambridge MA 02139,USA, July 1986.
[24] OpenMP Architecture ReviewBoard. OpenMPSpecifications, 1997 (Fortran), 1998 (C/C++)
[25] S. L. Peyton Jones.The Implementation of FunctionalProgramming Languages . Prentice Hall, 1987.
[26] S. L. Peyton Jones(editor), J. Hughes (editor), L. Augustsson, D. Barton, B. Boutel, W.Burton, J. Fasel, K. Hammond, R. Hinze, P. Hudak, T. Johnsson, M.Jones, J. Launchbury, E. Meijer, J. Peterson, A. Reid, C. Runciman, andP. Wadler. Haskell98: A Non-strict, Purely Functional Language, February 1999.
[27] M. C. Rinard and P. C.Diniz. CommutativityAnalysis: A New Analysis Framework for Parallelizing Compilers. ACMTrans. on Programming Languages and Systems (TOPLAS), 19(6):942-991,November 1997.
[28] M. C. Rinard, D. J.Scales, and M. S. Lam. Jade: AHigh-Level, MachineIndependent Language for Parallel Programming.IEEE Computer, 26(6):28-38, June 1993.
[29] N. Wirth.Programming in Modula-2. Springer Verlag, Berlin, Germany, 1982.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.