CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Making the transition from sequential to implicit parallel programming: Part 5
Implicit Parallel Programming: Declarative Programming Languages



Embedded.com
Implicit Producer-Consumer Parallelism
Most of the parallelism in matrix multiplication comes from data parallelism-an independent inner product is computed at each of the n2 components of the result matrix.

Another kind of parallelism that is very important is producer-consumer parallelism, and this arises naturally in functional languages with nonstrict data structures. Suppose, instead of scaling vector a by 5, we have a more complicated mapping function f:

b = array (1,n)
                [ (i, f (A!i)) | i <- [1.. n]]

The computation of all elements of b may be done in parallel, but they may take widely differing times. In languages with strict data structures (most existing programming languages), a consumer of b must wait until all components have been defined (i.e., until the slowest component computation has completed) because b is simply not available to the consumer until it is fully defined.

In languages like pH with nonstrict data structures, on the other hand, a consumer may be given a reference to b as soon as it is allocated and may attempt to read its components immediately.

These reads automatically wait as long as the components are empty and automatically resume as they become filled. Thus, the consumer can overlap some of its work with the computation of b's components.

For a more dramatic example of producer-consumer parallelism, suppose we have one computation that produces a list of sample readings (perhaps obtaining them from some measuring instrument) and another computation that analyzes these readings looking for interesting events. In pH,

producer instrument =
        if no-more-readings(instrument)
        then []
        else
            let
                x = get_reading(instrument)
            in
                x : produce (instrument)

analyzer [] = []
analyzer (x:xs) = if is_interesting(x)
                            then x : analyzer(xs)
                            else analyzer(xs)

The notation "x:xs" is pronounced "x cons xs" and represents a list with x in front, followed by the remaining list, xs. The notation [] represents an empty list. The producer function gets a reading x from the instrument and produces a list containing x followed by more readings, until the instrument indicates that there are no more readings.

The analyzer function handles two cases, and this is indicated by matching the actual argument against one of the two patterns [] and (x:xs). In the former case, it returns an empty list of interesting readings.

In the latter case, it checks if the first reading x is interesting. If so, it returns it, followed by the result of analyzing the remaining readings xs. Otherwise, it discards x and simply returns the result of analyzing the remaining readings.

Now, the interesting producer-consumer parallelism occurs when we plug these functions together:

events = analyzer (producer (a_particular_instrument))

Suppose the instrument yielded a reading every ten seconds, there are a hundred readings, and it takes five seconds to analyze each reading. With conventional ("strict") data structures, the producer function would, after 1000 seconds or so, return a list of 100 elements.

At this point, the analyzer would begin its work and, about 500 seconds later, eventually return a list of interesting readings. Overall, the computation takes 1500 seconds or more.

With nonstrict data structures, however, as soon as the instrument has yielded its first reading, the producer can return the first element of the list, leaving the "tail" of the list temporarily empty, to be filled in later when it gets the next reading.

Having returned this first reading, the analyzer can examine it immediately. If it is interesting, the analyzer, in turn, can immediately return the first element of its result list, leaving its tail empty, to be filled in later.

There is an implicit synchronization necessary here. In particular, the analyzer, having examined the first reading, may attempt to proceed to the next reading, which may not be available yet.

The semantics of nonstrict data structures specify that the analyzer automatically blocks until the tail of its input list is no longer empty (i.e., has been filled in by the producer).

One can visualize this as a chain that is gradually produced by the producer and incrementally consumed by the analyzer as it becomes available. The work of the analyzer and producer can thus be overlapped, instead of being run seriatim. Thus, each 5-second activity of the analyzer can take place during the 10-second wait for the next instrument reading, and the overall computation can be completed in about 1000 seconds.

Although we have illustrated producer-consumer parallelism with a simple, linear data structure (a list), the same principle applies to arbitrary data structures in pH and nonstrict functional languages.

For example, the producer may organize the instrument readings in a sorted tree. The picture changes to one where the fringe, or frontier, of the tree grows incrementally as readings arrive. The analyzer, in parallel, pursues all branches up to the current fringe, consuming values at the fringe as it moves outward.

Similarly, the production of different elements of an array may take varied times, and a consumer of the array can simultaneously pick up those values that are ready at any given point in time.

Next in Part 6: So, why aren't we all using functional languages yet?
To read Part 1, go to How sequential languages obscure parallelism.
To read Part 2, go to How to achieve parallel execution?
To read Part 3, go to Explicit Parallelism: Shared-Memory Programming with Threads and Locks
To read Part 4, go to Explicit Parallelism: Message-Passing programming

Rishiyur S. Nikhil and Arvind are the coauthors of Implicit Parallel Programming in pH, published by Elsevier/Morgan Kaufmann.

Rishiyur is Chief Technical Officer at ESL specialist Bluespec Inc. Previous to its acquisition by Broadcom he led the Bluespec technology team at Sandburst Corp. For the previous 9 years he was at Cambridge Research Laboratory (DEC/Compaq), including over a year as Acting Director. Earlier, he was an Associate Professor of Computer Science and Engineering at MIT. He holds several patents in functional programming, dataflow and multithreaded architectures, parallel processing and compiling.

Arvind is the Johnson Professor of Computer Science and Engineering at the Massachusetts Institute of Technology. His work laid the foundations for Bluespec, centering on high-level specification and description of architectures and protocols using Term Rewriting Systems (TRSs), encompassing hardware synthesis as well as verification of implementations against TRS specifications. Previously, he contributed to the development of dynamic dataflow architectures, the implicitly parallel programming languages Id and pH, and compilation of these languages on parallel machines. He was also president and founder of Sandburst Corp.

To read more on this topic on Embedded.com go to More about multicores, multithreading and multiprocessing.

References:
[1] "Microsoft: Parallel Programming Model Ten Years off," by Rick Merritt, Embedded.com, July 23, 2007
[2] "The Problem with Threads" by Edward A. Lee, IEEE Computer, 39:5, May 2006, pp 33-42
[3] "Why threads are a bad idea (for most purposes)," J.K. Ousterhout, Invited Talk, USENIX Technical Conference, January, 1996
[4] S. Ahuja, N. Carriero, and D. Gelernter. "Linda and Friends," IEEE Computer, 19(8):26-34, August 1986.
[5] Z. Ariola and Arvind. "Properties of a First-order Functional Language with Sharing." Theoretical Computer Science, 146(1-2):69-108, July 1995.
[6] J. Armstrong. "The Development of Erlang." In Proc. Intl. Conf. on Functional Programming (ICFP), Amsterdam, The Netherlands, pages 196-203, 1997.
[7] J. Armstrong, R. Virding, and M. Williams. "Concurrent Programming in Erlang." Prentice Hall, 1993. ISBN: 0-13-285792-8.
[8] K. Arnold and J. Gosling. The Java Programming Language, second edition. Addison-Wesley, 1998. ISBN 0-201-31006-6.
[9] Arvind, A. Caro, J.-W. Maessen, and S. Aditya. A Multithreaded Substrate and Compilation Model for the Implicitly Parallel Language pH. In Proc. Wkshp. on Languages and Compilers for Parallel Computing (LCPC), August 1996.
[10] Arvind, K. P. Gostelow, and W. Plouffe. The (preliminary) Id Report. Technical Report 114, Dept. of Information and Computer Science, Univ. of California, Irvine CA, USA, 1978.
[11] Arvind and R. S. Nikhil. Executing a Program on the MIT Tagged-Token Dataflow Architecture. IEEE Trans. on Computers, 39(3):300-318, March 1990.
[12] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An Efficient Multithreaded Runtime System. In Proc. 5th. ACM Symp. on Principles and Practice of Parallel Programming (PPOPP), Santa Barbara CA, USA, pages 207-216, July 19-21 1995.
[13] D. E. Culler, A. Dusseau, S. C. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. Yelick. Parallel Programming in Split-C. In Proc. Supercomputing 93, Portland OR, USA, November 1993.
[14] A. Geist, A. Begeulin, J. Dongarra, W. Jiang, R. Manchek, and V. S. Sundaram. PVM Parallel Virtual Machine. A User's Guide and Tutorial for Network Parallel Computing. MIT Press, Cambridge MA, USA, 1994.
[15] R. H. Halstead. Multilisp: A Language for Concurrent Symbolic Computation. ACM Trans. on Programming Languages and Systems (TOPLAS), 7(4):501-539, October 1985.
[16] High Performance Fortran Forum. High Performance Fortran Language Specification, Version 1.0. Technical Report CRPC-TR92225, Center for Research on Parallel Computation, Rice University, Houston TX, USA, May 3 1993.
[17] C. Hoare. Monitors: an Operating System Structuring Concept. Communications of the ACM, 17(10):549-557, October 1974.
[18] R. Hughes. Why Functional Programming Matters. The Computer Journal, 32(2):98-107, 1989.
[19] Threads standard POSIX 1003.1c-1995 (also ISO/IEC 9945-1:1996), 1996.
[20] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, May 1994.
[21] R. S. Nikhil. "Cid: A Parallel 'Shared-memory' C for Distributed Memory Machines."  In Proc. 7th. Ann. Wkshp. on Languages and Compilers for Parallel Computing (LCPC), Ithaca NY, USA (Springer Verlag LNCS 892), pages 376-390, August 8-10 1994.
[22] R. S. Nikhil and Arvind. Id: a Language with Implicit Parallelism. In A Comparative Study of Parallel Programming Languages: The Salishan Problems, John Feo (editor), pages 169-215. North Holland, 1992.
[23] R. S. Nikhil, K. Pingali, and Arvind. Id Nouveau. Technical Report CSG Memo 265, Computation Structures Group, MIT Lab. for Computer Science, Cambridge MA 02139, USA, July 1986.
[24] OpenMP Architecture Review Board. OpenMP Specifications, 1997 (Fortran), 1998 (C/C++)
[25] S. L. Peyton Jones. The Implementation of Functional Programming Languages. Prentice Hall, 1987.
[26] S. L. Peyton Jones (editor), J. Hughes (editor), L. Augustsson, D. Barton, B. Boutel, W. Burton, J. Fasel, K. Hammond, R. Hinze, P. Hudak, T. Johnsson, M. Jones, J. Launchbury, E. Meijer, J. Peterson, A. Reid, C. Runciman, and P. Wadler. Haskell 98: A Non-strict, Purely Functional Language, February 1999.
[27] M. C. Rinard and P. C. Diniz. Commutativity Analysis: A New Analysis Framework for Parallelizing Compilers. ACM Trans. on Programming Languages and Systems (TOPLAS), 19(6):942-991, November 1997.
[28] M. C. Rinard, D. J. Scales, and M. S. Lam. Jade: A High-Level, MachineIndependent Language for Parallel Programming. IEEE Computer, 26(6):28-38, June 1993.
[29] N. Wirth. Programming in Modula-2. Springer Verlag, Berlin, Germany, 1982.


1 | 2 | 3

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :