Achieving memory safety without compromise

March 13, 2018

acfoltzer-March 13, 2018

As embedded software developers, the tools we rely on must provide us with low-level control of the functionality and performance of the systems we build. There must be an ability to manipulate hardware registers to write device drivers, and we must be certain that no runtime system will interrupt our tasks and lead to missed deadlines. In this article, we will explore new advances in programming languages that offer this control without the tradeoffs in safety that come with conventional tools.

C and C++ remain the primary choices of programming language in the embedded world because they provide this level of control. The existing software that we extend and integrate into new embedded systems is largely written in C or C++, and so we often end up choosing these languages even for components that do not strictly need the level of control they offer.

This choice, however, comes at a price. Buffer overflows are consistently the most common root cause of vulnerabilities in The MITRE Corporation’s Common Vulnerabilities and Exposures (CVE) database. Worse than merely throwing an index-out-of-bounds exception, these overflows (and related memory safety violations such as use-after-free or use of uninitialized memory) allow attackers to hijack the control flow of programs, and serve as beachheads for subsequent attacks.

Static analysis tools, including linters and the compiler’s own warnings, will catch many simple cases of memory safety errors, but the inherent flexibility of C and C++ forces such tools in one of two directions. The tool might be conservative, making sure that any potential defect is flagged, but this leaves the developer to wade through and manually dismiss huge numbers of false positives. Or the tool might apply heuristics to reduce false positives, but will miss actual defects as a result.

Memory-safe languages take this entire class of vulnerabilities off the table, but have traditionally asked a price that is too high for embedded development. While Java, C#, Python, and JavaScript don’t let you write to memory past the end of an array, they also don’t let you write to the bare memory address mapped to your peripheral’s status register without a foreign-function interface in the way. And even for components that do not require raw memory accesses, the overhead and unpredictability of the language runtime is incompatible with highly memory-constrained systems or tasks with real-time deadlines.

Advances in Programming Languages Alter Tradeoffs

Fortunately, recent advances in programming languages are changing the shape of these tradeoffs, making it possible to have both memory safety and low-level control. Ivory and Rust are two programming languages that put these advances into practice. Ivory is a domain-specific language for safe real-time embedded systems, developed by Galois for the DARPA HACMS program. Rust is a general-purpose language sponsored by Mozilla, the creators of Firefox, in order to bring safety to systems programming across application areas. Rust is particularly appealing to C and C++ developers due to a familiar curly-brace syntax, while Ivory is a bit more exotic, generating safe C code from a language embedded in Haskell.

While they differ in specifics, these languages both use static analysis to guarantee that programs are memory-safe. Unlike C and C++, where the complexity of the language limits the usefulness of static analysis, Ivory and Rust make different tradeoffs that ensure the analysis is both computationally tractable, and useful to the developer. Ivory, aimed specifically at real-time systems, only allows allocation on the stack, and furthermore ensures that loops and array indices are bound statically. While these restrictions are acceptable and even helpful for real-time systems, Rust is less restrictive, and therefore applicable to more types of applications.

Both Ivory and Rust are useful and practical languages that can be deployed today. During the HACMS program, Galois used Ivory to develop an entire flight control computer for a small unmanned aerial vehicle, from the sensor fusion and communications security all the way down to the device drivers and board support package. Our partners at Boeing, trained in C++ but new to Haskell and domain-specific languages, used Ivory to rewrite a significant portion of the Unmanned Little Bird helicopter’s control software. We are using Rust today at Galois to build cooperative control algorithms for autonomous systems targeting embedded Linux and desktop Windows systems. As with Boeing’s Ivory work, we are converting an existing, sophisticated C++ control architecture into Rust, one module at a time. The lack of runtime systems and the native C ABI compatibility of both Ivory and Rust makes this style of incremental program hardening very practical, and allows us to prioritize resources on the parts of programs that are most vulnerable to attack: external data parsers and network interfaces.

Unlike Ivory, Rust programs can freely allocate to both the stack and the heap, but the developer does not manually manage that memory, and does not link in a runtime system or garbage collector. Instead, objects are simply created, much as they would in a managed language, and the compiler automatically inserts the proper allocation and destruction routines. The compiler achieves this by statically analyzing memory uses for ownership, borrowing, and lifetimes. A full explanation of these concepts is beyond the scope of this article, but you can get an intuition by analogy to C++11’s `unique_ptr` and `shared_ptr`. Ownership corresponds to `unique_ptr`: a Rust object can only be owned through one variable binding at a time. However, multiple immutable references of a Rust object can exist, similar to a `shared_ptr`. The notion of lifetime then corresponds to the way the C++ object is only destroyed once all its smart pointers go out of scope. By enforcing these memory use properties pervasively and at compile-time, the compiler lets us focus our attention on the domain-specific business of our application, rather than on checking for null pointers and memory leaks.

Rust is a relatively new language, hitting stability at version 1.0 in May of 2015. Support for embedded Linux is robust across ARM, x86, MIPS, PowerPC, and more. The compiler and library ecosystem is still maturing for bare metal and RTOS targets, although a robust community exists for both, including Tock, a safe, concurrent RTOS written in Rust for very low-memory and low-power applications.

While no language will displace C and C++ overnight for embedded developers, it is clear that years of research in programming languages is now yielding very practical results for embedded developers. When you next find yourself writing safety-critical code, consider whether you can put these results to work for you, in order to concentrate on the hard problems of your domain without compromising performance.

Adam Foltzer is Senior Research Engineer at Galois, a cybersecurity R&D organization that develops innovative technologies for the U.S. Dept. of Defense (DoD) and commercial enterprises.


Loading comments...