Reliable software has long been the dream of many researchers, practitioners, and users. In the last decade or so, several research and engineering breakthroughs have greatly improved the reliability of sequential programs (or the sequential aspect of parallel programs). Successful examples include Coverity’s source code analyzer, Microsoft’s Static Driver Verifier, Valgrind memory checker, and certified operating systems and compilers.
However, the same level of success has not yet propagated to parallel programs. These programs are notoriously difficult to write, test, analyze, debug, and verify, much harder than the sequential versions. Experts consider reliable parallelism “something of a black art” and one of the grand challenges in computing.
Unsurprisingly, widespread parallel programs are plagued with in- sidious concurrency bugs, such as data races (concurrent accesses to the same memory location with at least one write) and deadlocks (threads circularly waiting for resources). Some worst of these bugs have killed people in the Therac 25 incidents and caused the 2003 Northeast blackout. These bugs may be exploited by attackers to violate confidentiality, integrity, and availability of critical systems.
In recent years, two technology trends have made the challenge of reliable parallelism more urgent. The first is the rise of multicore hardware. The speed of a single processor core is limited by fundamental physical constraints, forcing processors into multicore designs. Thus, developers must resort to parallel code for best performance on multicore processors.
The second is our accelerating computational demand. Many physical-world computations, such as search and social networking, are now computerized and hosted in the cloud. These computations are massive, run on the “big data,” and serve millions of users and devices. To cope with these massive computations, almost all services employ parallel programs to boost performance.
If reliable software is an overarching challenge of computer science, reliable parallelism is surely the keystone. This keystone challenge has drawn decades of research efforts, producing numerous ideas and systems, ranging from new hardware, new programming languages, new programming models, to tools that detect, debug, avoid, and fix concurrency bugs.
As usual, new hardware, languages, or models take years, if not forever, to adopt. Tools are helpful, but they tend to attack derived problems, not the root cause. Over the past four years, we have been attacking fundamental, open problems in making shared-memory multithreaded programs reliable.
In our work, we target these programs because they are the most widespread type of parallel programs. Many legacy programs, such as the Apache web server and the MySQL database, are multi- threaded. New multithreaded programs are being written every day. This prevalence of multithreading is unsurprising given its wide support from many platforms (e.g., Linux, Windows, and MacOS) and languages (e.g., Java and C++11).
For the same reasons, we believe multithreading will remain prevalent in the foreseeable future. Unlike sequential programs, multithreaded programs are nondeterministic: repeated executions of the same multithreaded program on the same input may yield different (e.g., correct v.s. buggy) behaviors, depending on how the threads interleave.
Conventional wisdom has long blamed this nondeterminism for the challenges in reliable multithreading. For instance, testing becomes less effective: a program may run correctly on an input in the testing lab because the interleavings tested happen to be correct, but executions on the same exact input may still fail in the field when the program hits a buggy, untested interleaving. To eliminate non- determinism, several groups of brilliant researchers have dedicated significant effort to build deterministic multithreading systems.
Unfortunately, as explained in this paper, nondeterminism is responsible for only a small piece of the puzzle. Its cure, determinism, is neither sufficient nor necessary for reliability. Worse, determinism sometimes harms reliability rather than improves it. We believe the challenges in reliable multithreading have a rather quantitative root cause: multithreaded programs have too many possible thread interleavings, or schedules.
For complete reliability, all schedules must be correct. Unfortunately, ensuring so requires a great deal of resources and efforts simply because the set of sched- ules is just enormous. For instance, testing all possible schedules demands astronomical CPU time. (See §2 for detailed discussion.)
In our work we have attacked this root cause by asking: are all the exponentially many schedules necessary? Our study reveals that many real-world programs can use a small set of schedules to efficiently process a wide range of inputs. Leveraging this insight, we envision a new approach we call stable multithreading (SMT) that exponentially reduces the set of schedules for reliability.
We have built three systems: TERN and PEREGRINE, which are compiler and run- time systems to address key challenges in implementing SMT systems, and a third one, a program analysis framework that runs atop SMT to demonstrate key benefits of SMT .
These systems are designed to be compatible with existing hardware, operating systems, thread libraries, and programming languages to simplify adoption. They are also mostly transparent to developers to save manual labor. We leave it for future work to create new SMT programming models and methods that help active development.
Our initial results with these systems are promising. Evaluation on a diverse set of widespread multithreaded programs, including Apache and MySQL, show that TERN and PEREGRINE dramatically reduce schedules. For instance, they reduce the number of schedules needed by parallel compression utility PBZip2 down to two schedules per thread count, regardless of the file contents. Their overhead is moderate, less than 15% for most programs.
Our program analysis framework enables the construction of many program analyses with precision and coverage unmatched by their counterparts. For instance, a race detector we built found previously unknown bugs in extensively checked code with almost no false bug reports.
To read more of this external content, download the complete paper from the open, online author archives at Columbia University.