Alternatives to C/C++ for system programming in a distributed multicore world -

Alternatives to C/C++ for system programming in a distributed multicore world

The distributed, multicore train is stopping for no programmer, and especially the systems programmer will need to be ready to hop on to the distributed parallel programming paradigm to keep their systems running as efficiently as possible on the latest hardware environments.

There are three new systems programming languages that have appeared in the last few years which are attempting to provide a safe, productive, and efficient parallel programming capability. Go is a new language from Google [1], Rust is a new language from Mozilla [2], and ParaSail is a new language from AdaCore [3][4].

The designers of Go, Rust, and ParaSail all are facing a common challenge — how to help programmers address the new distributed and multicore architectures, without having the complexity of programming going past that which is manageable by the professional, yet still human, programmer. All programming languages evolve, and as a rule, they tend to get more complex, not less so.

If every time a new hardware architecture becomes important, the programming language is enhanced to provide better support for that architecture, the language can become totally unwieldy, even for the best programmers.

When the architectures changes radically, as with the new massively distributed and/or multicore/manycore architectures, this may mean that the language no longer hangs together at all, and instead has become a federation of sublanguages, much like a house that has been added onto repeatedly with a different style for each addition.

Because of the complexity curse associated with language evolution, when there is a significant shift in the hardware landscape, there is a strong temptation to start over in programming language design. After many years of a relatively stable programming language world, we now see a new burst of activity on the language design front, inspired in large part by the sense that our existing mainstream languages are either not going to be supportive enough, or that they are becoming simply too complex in trying to support both the old and new hardware paradigms through a series of language extensions.

Go from Google
Go, Rust, and ParaSail all emerged over the past few years, each with its own approach to managing complexity while supporting parallelism. Go from Google is the brain child of Rob Pike and his colleagues at Google. Rob was at Bell Labs in the early Unix and C days, and in many ways Go inherits the C tradition of simplicity and power. Unlike C, storage management has been taken over by the Go run-time through a general purpose garbage collection approach, but like C, care is still needed in other areas to ensure overall program safety.

From the multicore perspective, Go uses goroutines for structuring large computations into a set of smaller potentially parallel computations. Goroutines are easy to create – essentially any stand-alone call on a function or a method can be turned into a goroutine by simply prefixing it with the word “go.” Once a goroutine is spawned, it executes independently of the rest of the code. A goroutine is allowed to outlive the function that spawns it, thanks to the support of the garbage collector; local variables of the spawning function will live as long as necessary if they are visible to the spawned goroutine.

For goroutines to be useful, they need to communicate their re- sults back to the spawning routine. This is generally done using strongly-typed channels in Go. A channel can be passed as a parameter as part of spawning a goroutine, and then as the goroutine performs its computation it can send one or more results into the channel.

Meanwhile, at some appropriate point after spawning the goroutine, the spawner can attempt to receive one or more values from the channel. A channel can be unbuffered, providing a synchronous communication between sender and receiver, or it can provide a buffer of a specified size, effectively creating a message queue.
Communication between go routines can also go directly through shared global variables. However, some sort of synchronization through channels or explicit locks is required to ensure that the shared variables are updated and read in an appropriate sequence.

Here is an example Go program that counts the number of words in a string, presuming they are separated by one or more separator characters, dividing multi-character strings in half and passing them off to goroutines for recursive word counts:

Unique features of Go.
Go has some unusual features. Whether a declaration is exported is determined by whether or not its name begins with an upper- case letter (as defined by Unicode); if the declaration is a pack- age-level declaration or is the declaration of a field or a method, then if the name starts with an upper-case letter, the declaration is visible outside the current package.

Every Go source file begins with a specification of the package it is defining (possibly only in part). One source file may import declarations from another by specifying the path to the file that contains the declarations, but within the importing code the ex- ported declarations of the imported code are referenced using the imported file’s package name, which need not match that of the imported file’s filename. Of course projects would typically establish standard naming conventions which would align source file names and package names somehow.

Go provides a reflection capability, which is used, for example, to convert an object of an arbitrary type into a human-readable rep- resentation. The “%v” format in Go’s version of printf does this, allowing arbitrarily complex structs to be written out with some- thing as simple as:

   fmt.Printf(“%v”, my_complex_object)
Printf is implemented in Go itself, using the “reflect” package.

There are no uninitialized variables in Go. If a variable is not explicitly initialized when declared, it is initialized by default to the zero of its type, where each type has an appropriately-defined zero, typically either the zero numeric value or the nil (aka “null”) pointer value, or some composite object with all components having their appropriate zero value.

What Go leaves out. Because complexity was a major concern during all three of these language designs, some of the most important design decisions were about what to leave out of the language. Here we mention some of the features that Go does not have.

Go does not permit direct cyclic dependencies between packages. However, the Go interface capability permits the construction of recursive control or data structures that cross packages, because an interface declared in one package can be implemented by a type declared in another package without either package being directly dependent on the other.

Like C, Go has no generic template facility. There are some built- in type constructors, such as array, map, and chan, which are effectively parameterized type constructors, but there is no way for the user to create their own such type constructor. Unlike C, there is no macro facility which might be used to create something like a parameterized type. Nevertheless, Go’s flexible interface and reflection capabilities allow the creation of complex data structures that depend only on the presence of a user-provided method such as Hash and the DeepEqual function of the reflect package.

Go does not allow user-defined operators. Various operators are built in for the built-in types, such as int and float32. Interestingly enough, Go does include built-in complex types (complex64 and complex128) with appropriate operators.

Go does not have exceptions. However, functions and methods may return multiple results, and often errors are represented by having a second return value called error that is non-nil on error. Unlike in C, you cannot ignore such an extra parameter; unless you explicitly assign it to an object of name “_”.  When things go really wrong in Go, a run-time panic ensues, and presumably during development, you are tossed into a debugger.

Rust from Mozilla Research
The modern web browser represents one of the most complex and critical pieces of software of the internet era.  The browser is also a place where performance is critical, and there are many opportunities for using parallelism as a web page is “rendered.”  The Rust language arose originally as a personal project by one of the engineers at Mozilla Research (Graydon Hoare), and has grown now into a Mozilla-sponsored research effort. Rust has been designed to help address the complexity of building components of a modern browser-centric software infrastructure, in the context of the new distributed multicore hardware environment.

Like Go, Rust has chosen to simplify storage management by building garbage collection into the language. Unlike Go, Rust has chosen to restrict garbage collection to per-task heaps, and adopt a unique ownership policy for data that can be exchanged between tasks. 

What this means is that data that can be shared between tasks is visible to only one of them at a time, and only through a single pointer at a time (hence an owning pointer). This eliminates the possibility of data races between tasks, and eliminates the need for a garbage collector for this global data exchange heap. When an owning pointer is discarded, the storage designated by the pointer may be immediately reclaimed – so no garbage accumulates in the global exchange heap.

Here below is a Rust version of the Word Count program, recursing on multi-character strings with subtasks encapsulated as futures computing the subtotals of each string slice:

Rust does not have special syntax for spawning a “task” (Rust’s equivalent of a “goroutine”) nor declaring the equivalent of a “channel,” but instead relies on its generic template facility and a run-time library of threading and synchronization capabilities. In the above example, we illustrate the use of futures which are essentially a combination of a task and an unbuffered channel used to capture the result of a computation.

There are several other mechanisms for spawning and coordinating tasks, but they all depend on the basic tasking model, as mentioned above, where each task has its own garbage-collected heap for task-local computation (manipulated by what Rust calls managed pointers), plus access via owning pointers to data that can be shared between tasks (by sending an owning pointer in a message).

Rust’s memory management performance. One of the major advantages of the Rust approach to memory management is that garbage collection is local to a single task. By contrast, in Go each garbage collector thread has to operate on data that is potentially visible to all goroutines, requiring a garbage collection algorithm that synchronizes properly with all of the active goroutines, as well as with any other concurrent garbage collector threads (presuming garbage collection itself needs to take advantage of parallel processing to keep up with multi-threaded garbage generation).

In Rust, a conventional single-threaded garbage collector algorithm is adequate, because any given garbage collector is working on a single per-task heap.  Furthermore, storage visible via owning pointers needs no garbage collection at all, as releasing an owning pointer means that the associated storage can also be released immediately.

Costs of safety and performance. One of the downsides of added safety and performance can be added complexity. As we see, Rust has added safety by allowing access to sharable data only via pointers that give exclusive access to one task at a time, and added performance because garbage collection is single-threaded. However, as a result, Rust needs several kinds of pointers.

In fact, there are four kinds of pointers in Rust, managed pointers (identified by ‘@’as a prefix on a type) for per-task garbage-collected storage, owning pointers (identified by ‘~’) for data that is sharable between tasks, borrowed pointers (identified by ‘&’) that can temporarily refer to either per-task or sharable data, and raw pointers (identified by ‘*’), analogous to typical C pointers, with no guarantees of safety.ParaSail from AdaCore
The ParaSail language from AdaCore takessupport for parallelism one step further than Go or Rust, by treatingall expression evaluation in the language as implicitly parallel, whilealso embracing full type and data-race safety. Rather than addingcomplexity to accomplish this, the explicit goal for ParaSail was toachieve safety and pervasive parallelism by simplifying the language,eliminating impediments to parallelism by eliminating many of thefeatures that make safe, implicit parallelism harder.

Some of the features left out of ParaSail include the following:

  • No pointers
  • No global variables
  • No run-time exception handling
  • No explicit threads, no explicit locking nor signaling
  • No explicit heap, no garbage collection needed
  • No parameter aliasing

Sowhat is left? ParaSail provides a familiar class- and-interface-basedobject-oriented programming model, with mutable objects and assignmentstatements. But ParaSail also supports a highly functional style ofprogramming, aided by the lack of global variables, where the onlyvariable data visible to a function is via parameters explicitlyspecified as var parameters. This means that the side-effects of afunction are fully captured by its parameter profile, which togetherwith the lack of parameter aliasing allows the compiler to verify easilywhether two computations can safely be performed in parallel.

Bydesign, every expression in ParaSail can be evaluated in parallel. Iftwo parts of the same expression might conflict, the expression is not alegal ParaSail expression. The net effect is that the compiler canchoose where and when to insert parallelism strictly based on what makesthe most sense from a performance point of view. Here, for example, isthe Word Count example in ParaSail, where parallelism is implicit in therecursive calls on Word Count, without any explicit action by theprogrammer:

Althoughthere is no explicit use of a parallel construct, the sum of the tworecursive calls on Word_Count can be evaluated in parallel, with thecompiler automatically creating a picothread for each recursive call,waiting for their completion, and then summing the results, without theprogrammer having to add explicit directives.

Implicit andexplicit parallelism. Explicit parallelism may be specified if desiredin ParaSail, or the programmer can simply rely on the compiler to insertit where it makes the most sense. The general philosophy is that thesemantics are parallel by default, and the programmer needs to work abit harder if they want to force sequential semantics. For example,statements in ParaSail can be separated as usual with “;” (which isimplicit at the end of the line when appropriate), or by “||” if theprogrammer wants to request explicit parallelism, or by “then” if theprogrammer wants to disallow implicit parallelism. By default thecompiler will evaluate two statements in parallel if there are no datadependencies between them.

As another example of ParaSail’simplicit and explicit parallelism, the iterations of “for I in 1..10loop” are by default executed in any order, including parallel if thereare no data dependencies between the loop iterations, while “for I in1..10 forward loop” or “for I in 1..10 reverse loop” may be specified toprevent parallel evaluation, and “for I in 1..10 concurrent loop” maybe used to specify that parallel evaluation is desired, and it is anerror if there are any data dependences between the iterations. In allthese cases, the compiler will ensure that any parallel evaluation issafe and data-race free, and will complain if there are potential raceconditions when parallel evaluation semantics are specified.

Thevirtuous cycle: simplicity breeds simplicity:   There is somewhat of avirtuous cycle that occurs when a programming language is simplified, inthat one simplification can lead to another. By eliminating pointersand a global heap from ParaSail, the language can provide fullyautomatic storage management without the need for a garbage collector.

Objectsin ParSail have value semantics meaning that assignment copies thevalue of the right-hand side into the left-hand side, with no sharing ofdata. A built-in move operation is provided for moving the value of theright-hand side into the left-hand side, along with a swap for swappingvalues, thereby reducing the cost of copying while still preservingvalue semantics.

Every type in ParaSail has an extra value,called null, which is used to represent an empty or otherwiseuninitialized value. An object or component may have a null value of itstype T only if it is declared to be “optional T”. Optional componentsmay be used to implement trees and linked lists, without the use ofexplicit pointers, and without the potential sharing issues associatedwith pointers, even if behind the scenes the compiler uses pointers toimplement optional objects or components.

The availability ofoptional components effectively allows an object to grow and shrink,meaning that dynamic structures like hash tables, and higher levelnotions such as maps and sets, can be implemented in ParaSail withoutany explicit pointers, with the advantages of purely value semantics.

Thelack of explicit pointers means that all objects in ParaSaileffectively live on the stack, even though they may still grow andshrink. Each scope has its own region, effectively a local heap thatexpands and contracts to hold the objects associated with the scope. Nogarbage collector is needed because when an object goes out of scope, oran object or one of its component is set back to null, the associatedstorage may be immediately reclaimed, much like storage designated byowning pointers in Rust. By simplifying the type model, the storagemanagement in ParaSail is dramatically simpler and of higherperformance, with no complex parallel garbage collection algorithmrequired.

Implementation Status and Conclusions
Theprogramming language design world has been rejuvenated by the newchallenges of distributed and multicore architectures. Three newprogramming languages designed for building industrial-strength systemshave emerged, Go, Rust, and  ParaSail. Each of these languages tries tomake parallel programming simpler and safer, while still providing thelevel of power and performance needed for the most critical systemsdevelopment tasks.

From an implementation status point of view,Go is the most stable of these three new languages, with two compilersavailable, and a number of systems built with Go now being deployed.Rust and ParaSail are still under development, but both are availablefor trial use, with Rust having early on achieved the compiler bootstrap milestone, where the Rust compiler is itself implemented in Rust.

Allthree languages provide opportunities for the professional programmerto expand their horizons, and see what sort of new paradigms and idiomswill become more important as we leave behind the era of simplesequential processing, and enter the era of massively parallel, widelydistributed computing.

S. Tucker Taft is VP andDirector of Language Research at AdaCore (  He joinedAdaCore in 2011 as part of a merger with SofCheck, which he had foundedin 2002 to develop advanced static analysis technology. Prior to that hewas a Chief Scientist at Intermetrics, Inc. and its follow-ons for 22years, where in 1990-1995 he led the design of Ada 95. He is recipientof an  A.B. Summa Cum Laude degree from Harvard University, where he hasmore recently taught compiler construction and programming languagedesign.

This article was presented as a part of a class(ESC-218) conducted by S. Tucker Taft at the Embedded System Conferenceon: “Systems Programming in the Distributed, Multicore World.”

1. The Go Programming Language , Google Corporation
2. The Rust Programming Language , Mozilla Research
3.  ParaSail: Less is More with Multicore , S. Tucker Taft,  (retrieved 4/1/2013)
4. Designing ParaSail: A New Programming Language , S. Tucker Taft

5 thoughts on “Alternatives to C/C++ for system programming in a distributed multicore world

  1. I would also suggest that people look at Erlang.

    Erlang was designed from the outset to support concurrency to the point that processes can be treated as 1st class objects. It has a long embedded pedigree being originally designed to program telecom swit

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.