Parallel programming has the reputation of being an exotic field,
pursued by experts using extremely large and expensive machines.
Unfortunately, due in part to its history, parallel programming
languages and tools still mostly focus on "big iron" and older
languages such as C and Fortran. Performance improvement via
parallelism should be of interest to anyone whose codes run too slowly.
This requires a shift in focus.
Today, many new computers are multicore and most users have access
to multiple computers. Many developers work with newer dynamic
languages like Python and R.
To meet the needs of these users, we've
developed a Python-based coordination system called "NetWorkSpaces"
(NWS) that is easy to learn, accessible via almost all development
environments (including R, Java, octave, Python, Perl, and Ruby), and
deployable on ad hoc collections of spare CPUs.
But while its
simplicity makes it a good choice for pedagogical examples, it's not a
toy system. We've used NWS to run parallel programs on hundreds of
processors, producing many CPU years of useful computation.
NetWorkSpaces
NetWorkSpaces (www.lindaspaces.com/products/NWS_overview.html) was
developed at Scientific Computing Associates and is available at
SourceForge (nws-py.sourceforge.net/).
You must install both the server (on one machine) and a client (on
all machines involved in the computation). The server is implemented
using Python and Twisted, which are required. Even though NWS is
implemented in Python, we have NetWorkSpace client APIs for a variety
of languages. While we describe the Python client here, the ideas
transfer to other language clients.
NWS is based on the concept of a set of bindings, which in
programming languages are sometimes known as "environments,"
"namespaces," "workspaces," and the like. Generally in programming
languages, a binding maps a name to a value. Because this is a concept
familiar to programmers, it is a good foundation for building a
coordination system.
A given language has rules about allowable names, allowable values
for a given name, and the context in which the binding is valid (the
binding's scope). The language also provides operations for
establishing a binding and for retrieving the value of a bound name.
Often these operations are implied by the lexical structure of code,
as
is the intended binding set. So, for example, in x = y the y
implies a look-up of the value bound to y, while the x
is the target of the assignment. Scoping rules determine which x
and y are meant.
NWS provides a particular encapsulation of binding semantics. Using
this encapsulation, we explicitly specify the look-up (fetch),
the association of the name x with the retrieved value (store),
and the intended binding set (indicated by the ws object).
Thus, a simple assignment looks like this in NWS:
ws.store('x', ws.fetch('y'))
So far, we've succeeded in making a fairly routine construct more
verbose. The key point is that the NWS encapsulation is amenable to a
network-based implementation, which lets different processes exchange
data and synchronize via NWS bindings.
In many languages, including Python, we could have used syntax similar
to that of normal bindings:
ws.x = ws.y
However, the semantics of these NWS variables differ in important ways
from that of normal variables. In our opinion, it's a mistake to create
a false illusion of similarity when, in fact, there are important
differences.
NWS is designed to be a coordination facility that is language neutral.
The advantages this neutrality offers include:
- NWS coordination patterns and idioms can be recycled from one
language environment to the next.
- NWS can be used to coordinate heterogeneous ensembles of code
written in different languages.
To facilitate interlanguage coordination, NWS variable names are ASCII
strings and don't need to conform to the variable naming rules of any
given language. The values can be any native type in the client
language for which that language has a workable serialization.
Most
values in most of the languages mentioned can be automatically
serialized (the serialization is done behind the scenes by NWS, and is
not of direct concern to programmers). For example, Python NWS can
automatically handle composite data structures:
>>> from nws.client import NetWorkSpace
>>> ws=NetWorkSpace('test')
>>> l=['a','b','c']
>>> t=(1,2,3)
>>> d={'list':l, 'tuple':t}
>>> ws.store('dict example', d)
>>> ws.fetch('dict example')
{'list': ['a', 'b', 'c'], 'tuple': (1, 2, 3)}
Finally, ASCII strings used as values are treated in a special way
(they are not subject to the client language serialization protocol)
that makes it possible for them to be exchanged across client
languages. In
Example 1 below,
for instance, you can use NWS to move data
from Python to R encoded as an ASCII string.
Python
>>> from nws.client import NetWorkSpace
>>> ws=NetWorkSpace('tickets')
>>> ws.store('ticket', 'ticket string')
R
> library(nws)
> ws<-netWorkSpace('tickets')
> nwsFetch(ws, 'ticket')
[1] "ticket string"
Robert and Nicholas are members of
the Department of Computer Science and W.M. Keck Foundation
Biotechnology Resource Laboratory at Yale University. Nicholas and
Stephen are researchers at Scientific Computing Associates.