How to make C++ more real-time friendly

Scot Salmon, National Instruments

April 06, 2014

Scot Salmon, National InstrumentsApril 06, 2014

Giving real-tine C++ a Boost
I’ve already alluded more than once to Boost, a popular source of free, peer-reviewed C++ libraries. Boost libraries are generally licensed for use in both open- and closed-source projects and are designed to work well with the STL.

For many purposes, Boost classes are de facto standards – in fact, several Boost libraries have already been adopted into the C++ standard, and many more are being considered for future adoption. For many C++ developers, Boost is as much a part of the C++ environment as core language features like the STL itself, so it’s important for real-time developers to also consider which Boost classes might be appropriate for real-time applications.

Like the STL, Boost is a wide-ranging collection of tools, and reviewing them all is far beyond the scope of this paper.  There are libraries that work at the preprocessor level, providing useful functionality ranging from static assertions to entire meta-programming languages; there are libraries that provide OS abstractions such as multithreading, date/time, and file system support (which may require some porting effort to adapt to real-time systems); there are heavyweight “toolbox” libraries that implement parsers and the like; and there are STL-like data structures and algorithms.  The main area of interest for this paper is the last category.

The shared_ptr class, in Boost’s most commonly used library, Smart Pointers, offers a simple way to safely manage the lifetime of dynamically allocated objects.  Unlike C++’s native auto_ptr type, a shared_ptr can be stored in a container.  This functionality is so useful that Boost recommends to use the syntax “shared_ptr<T> p(new T);” whenever using new (instead of “T* p = new T;”). 

There are notable concerns with using shared_ptr’s in real-time systems, however: it has those dreaded hidden memory allocations, and it may (does, on RTOS’s) use a mutex to ensure atomicity of its reference count.

Fortunately, you can use simple techniques to prevent these problems.  These techniques will become familiar to you as you get more comfortable using C++ in real-time systems, because they also work in various other situations when using the STL and other language features. 

The first technique is the use of swap.  A swap method is provided on Boost and STL containers to switch the contents of two containers, is always no-throw, and is typically (though not always) very efficient.  You can use shared_ptr::swap in your real-time code and implement a non-real-time “garbage can” (or recycling bin, if you prefer), a separate shared_ptr object that takes the swapped content and deletes it at the next non-time-critical opportunity.

The second technique is using references, including references into containers, to avoid unnecessary copies.  You’ve probably already become familiar with passing by reference even in C, where it’s always done using a pointer, but C++’s reference syntax (i.e., T&) offers additional power and convenience; in this case, by allowing you to safely work on elements in a container directly, without incurring a shared_ptr copy (and thus avoiding the mutex).  Similarly, it is often appropriate to pass a shared_ptr by const&, for the same reason.

Another option from the Boost shared pointer library, intrusive_ptr, can help address the shared_ptr concerns, at the cost of some developer convenience.  In this case, the developer using the class specifies the reference counting mechanism, which allows you to implement a strategy appropriate for real-time systems.

Other STL-like classes offered by Boost provide containers that can be very useful in real-time applications:

  • Boost.Array: a very simple C array wrapper, providing the STL-like methods and memory management of a vector without the overhead of supporting dynamic resizing.
  • Boost Pool: includes an STL Standard Allocator (more on these shortly) that uses a pooled allocation mechanism, very useful for optimizing the case of many small allocations.  This type of custom allocator could solve a variety of ills related to dynamic resizing STL containers, but beware: the pooled allocator itself will do hidden memory allocations to resize the pool if needed, and preventing this requires overriding the lower-level allocator used by the pool to cause dynamic allocations to fail. Since the pool is global, causing dynamic allocation to fail would limit it to being used by one container (the first one that uses it), rendering it far less useful.
  • Boost.Pointer Container: provides an STL-like container to manage heap-allocated (i.e., pointer) objects. Avoids the overhead of object copies or shared_ptr while still managing the lifetime of the contained objects in an exception-safe way.

STL allocators for memory management
Earlier in the paper, I alluded to the possibility of using the STL’s custom allocator feature to address the memory management concerns associated with some of the containers.  

Developers who have done some basic digging into the details of the STL are no doubt aware of an optional template parameter to most STL types, which specifies a class to use for memory allocation when constructing objects of the templatized type.  For example, if std::vector<int> is a vector of integers with the default allocator, std::vector<int, MyAllocator> is a vector of integers which uses MyAllocator to manage its memory.

By description, this feature sounds extremely appealing for real-time applications. For example, it could be used to cause dynamic resizing of a map to fail long before the system runs out of memory, or store a vector in shared memory, or retrieve the memory needed for list elements from a pre-allocated pool.  And indeed, a custom allocator can do this (in fact, a pooled allocator is one of the most commonly attempted uses of the feature; recall the Boost Pool Library described above).

However, making a general-purpose allocator that handles all the behaviors for allocating and freeing memory needed by STL containers requires care, skill, and extreme patience.  The details are beyond the scope of this paper, but you can get the general idea – at least, the general idea of the complexity – from web pages such as at  CodeGuru.   

Real-time meets exceptions
Although this paper uses the similarities between kernel mode development and real-time development to highlight various points about the use of C++ in real-time systems, one area where real-time systems veer off the path of desktop kernels entirely is exception support.  

C++ is disallowed generally in Linux kernels; but even in Windows, where C++ is mostly well-supported, only a non-C++ variant of exceptions (structured exception handling) is allowed. Real-time operating systems, on the other hand, do generally support C++ exceptions. On a desktop OS, most of the development is done in user mode where exceptions are well-supported, so the vendors have little motivation to support exceptions in the kernel, especially considering the additional risks to the stability of their OS.  

However, on an RTOS, all developers are in the same “kernel” boat, so the vendors must support exceptions, as there is no separate mode that supports C++ development for those who demand it.

The risks that the desktop kernel mode decision-makers have opted to avoid are legitimate and do apply to real-time systems.  Exceptions can put large objects on the stack, a critical and often tightly limited resource. Misuse of exceptions can cause performance problems and provide a perhaps overly convenient way to crash the system if not caught.  And, to echo a refrain seen elsewhere in this paper, they can cause code bloat.

On the other hand, exceptions provide a powerful mechanism to ensure that errors are handled consistently.  Much of the “bloat” that might be observed with exceptions can be traded off against the knowledge that it provides error-checking code that you probably should have had all along, but was easy to overlook without exceptions in the picture.  

Pervasive use of exception-based code and related idioms makes it easier to write high-quality transactional code that can offer a strong contract with users or other components, ensuring that operations either succeed, or fail in such a way that your component is back in the state it was before the transaction was attempted – a very useful guarantee in embedded systems that must run for extended periods with little opportunity for user interaction to handle problems.

It makes sense to be cautious about using a relatively resource-hungry feature of the language like exceptions in resource-limited real-time systems. Some real-time platforms, such as VxWorks,  are particularly susceptible to problems related to exception handling, and exceptions should be avoided in your “inner loop” and other performance-critical code where the overhead of exceptions is most noticeable.  The case that is especially problematic on VxWorks is the (normally recommended) combination of exceptions with stack objects used for RIIA resource management. Consider:

  #include <exception>

  class Automatic
      virtual ~Automatic();

  void justAuto()
     Automatic automatic;

  void justThrow()
    throw std::exception();

  void butBothMakesLargeCode()
    Automatic automatic;
    throw std::exception();

This snippet shows three very simple functions.  If I told you that the function that just has an automatic object in it compiles to about 20 lines of assembly, and the function that just throws an exception compiles to about 20 lines of assembly (both true), you would probably expect that the function which does both compiles to about 40 lines of assembly.  And you would be right, on some platforms.  

But compilers are free to implement exception handling in a variety of ways – in VxWorks’ case, choosing a behavior of GCC (with dynamic setjmp-longjmp code, as opposed to static unwinding tables) that adds setup and teardown code.  This adds overhead even when an exception is not thrown that balloons those 40 lines of assembly to 70. As you might imagine, this significantly affects performance of small functions that use exceptions and automatic objects, making exceptions inappropriate for such functions on this popular real-time platform.

Another exception-related feature to be wary of is the throw specifier.  This is an optional keyword provided with a function declaration that describes what exceptions that function may throw.

It looks like a useful tool that might allow you to take advantage of constraints on which functions can throw which exceptions and simplify your code, but the implementation is necessarily limited to the point where most commentators recommend against the use of the feature entirely.  

The root problem is that C++ has to support existing code that does not use any throw specifier, and which might throw any exception, or none at all – the lack of a specifier provides no information and must be interpreted by the compiler as “can throw anything”. 

This bubbles up into any code that calls into that code, and so on, such that enforcing throw specifiers at compile time is impossible.  You may see where this is going: throw specifiers are in fact enforced at run time, with additional (hidden) code added by the compiler to turn exceptions which violate throw specifiers into calls into the std::unexpected handler routine – a routine which is provided no context for what triggered it and is therefore nearly useless.  

The bottom line: you get larger code size and still no useful way to prevent or handle unexpected exceptions. Throw specifiers should not be used.  The possible (though unlikely) exception is throw(), which means “throws nothing” and can be used by the compiler to make some slight optimizations when calling such functions, but even this is probably not generally useful.  

A better option, considering that the throw specifier is largely for documentation, is to use a macro such as NOTHROW, defined to nothing, for this purpose.  This also makes it easy to experiment with enabling the throw specifier to see if your compiler actually gives you a worthwhile benefit from it (if nothing else, you might try enabling it in debug builds to let certain compilers check your virtual function hierarchies for consistency in this documentation).

Real time meets templates
During the discussion of Boost and the STL, we mentioned that both of these libraries make extensive use of the C++ template feature and alluded to more general problems with code size caused by templates.

First of all, the good side: templates are a useful tool for real-time systems despite their drawbacks, because they provide a native mechanism to implement compile-time polymorphism, allowing code which would otherwise have to make a run-time decision to avoid that  performance overhead.  

This, along with the extensive use of templates in critical libraries like the STL, means that we need this weapon in our arsenal when using C++ on real-time platforms.  The other good news is that there has been a lot of improvement over the last few years in how C++ compilers handle templates.

A simple example of what a template user should be aware of is to imagine code that instantiates the “same” class several times – where the “same” class is actually a template class with different template parameters. For example, consider a function which uses STL vectors.  Let’s set the “base cost” of a vector as the cost of one instantiation, i.e., a single std::vector<int>.  

One compiler (g++ 3.4.4) creates a 9,464-byte code block in this case.  The incremental cost of more vectors of integer is small – the same code block when using three such vectors is only 416 bytes larger.  But if the vector is instantiated with a different type, so that your three vectors are, say, std::vector<int>, std::vector<float>, and std::vector<char>, the size delta is 4,000 bytes.

This code size change is completely invisible in your code that uses the vector class, so it’s something you need to explicitly think about when using template classes generally.

If you are writing a template class, a common technique to reduce redundant code is to have your template type wrap a non-template class that does the bulk of the implementation (and thus has the bulk of the code).  

Some of your template class’s methods are likely to be identical between different instantiations, and a smart linker will collapse them, and the few pieces of instance- specific code should be significantly smaller in size than a separate instance of the entire functionality, while still providing all the instance-specific behavior (such as type safety) that you require from your template interface.  A simple example is shown below:

  struct tGetterBase
    tGetterBase(void *val) : m_val(val) {}
    void *get() const { return m_val; }
    void *m_val;

  template <typename T>
  struct tGetter : private tGetterBase
    tGetter(T *val) : tGetterBase(val) {}
    T* get() const
      return static_cast<T*>(tGetterBase::get());

Most STL implementations use this technique on pointer types and container nodes.

The GNU toolchain includes several tools I find very useful for tracking down issues related to code size, especially when using complex libraries like the STL, or large static libraries, where the causes of increased code size may be essentially invisible in the client code. In particular, I commend to you the following: 

  • ld --cref (cross-reference table); 
  • nm -C --size-sort (C++ symbol demangling and sorting by function size); and
  • size (break down binary size by segment)

The cross-reference table allows you to trace the dependencies of your functions and determine, for example, why a given object is pulled in from a static library.  This is important for doing code size analysis since GCC includes the entire object in that case, even if only one function is used.

The ability to sort functions by size allows you to quickly deduce which functions are contributing excessively to your code size.  This is particularly useful in C++ code, especially in combination with automatic C++ demangling, because so often, the largest functions are the result of a hidden use of templates and not readily apparent when reviewing the code itself.  

The size utility is a higher-level quick view of your binary’s segments, and can help determine whether the actual code, or other data in your binary, is responsible for code size problems.

Real time meets inheritance
One of the main features that draws developers to C++ is the ease of use it provides for object- oriented behavior like inheritance and polymorphism.   These popular design techniques are quite painful to implement in C, so applications where C++ might otherwise be avoided may use C++ simply because of its native support for class hierarchies.

Real-time developers (all C++ developers, actually, but especially those on resource-constrained, timing-sensitive platforms) should be aware, however, that behavior like virtual function calls implies hidden performance penalties beyond what you might have heard in an introductory C++ class.  

Specifically, overuse of inheritance – and thus, unnecessary virtual functions – causes your code’s memory usage to bounce back and forth between the location in memory where your object was instantiated, and the likely quite different location where the virtual function lookup table for each class is stored.  

The problem is that C++ stores the virtual method lookup table, or vtable, per class, not per object. As your code flow encounters an object that uses inheritance, it must access the vtable to look up the virtual function, then access the object again for its per- instance data.  

For your real-time system, this can be disastrous, as your cache may not be able to keep both the instructions (from one part of memory), data (from another part of memory), and vtable (from yet a different part of memory) in cache together – clearly making this much more expensive than the “just another pointer dereference!” most people weigh when considering a vtable lookup.  

The simple solution is simply to use virtual interfaces only where you really need an interface, not just as a “low cost” convenience. On embedded platforms, that cost is probably higher than expected.

Pointer to implementation problems
Another popular C++ interface idiom with similar drawbacks is the use of the “pointer to implementation”, or “pImpl”, data object in a class.  The problem with this idiom is similar to the above: your code is accessing one object (the containing object) in one part of memory, then has to jump to a completely different area of memory, spilling useful data out of the cache to make room for an object that was dynamically allocated without regard to cache locality. 

There are (somewhat complex) variations of the idiom that can help with this if you really need it, such as that proposed at Herb Sutter's  It is typically possible to avoid this idiom in performance-sensitive code, however, since the use case for it is at your interface to external clients, which usually isn’t your time-critical “inner loop” code.

Similarly, the “pointer to data” concept – where a C++ developer hides an internal data class by storing it only by pointer (as opposed to containing it with normal aggregation) – also has poor cache locality, requiring the code to find some data (such as the pointer itself) in one part of memory where the containing object exists, and other closely related data (the dynamically- allocated object) entirely elsewhere.

Scot Salmon isa senior software engineer in the LabVIEW Real-Time group at National Instruments, with over 10 years of experience developing board support packages, drivers, and applications for VxWorks, Linux, and Windows. This article was part of a class (ESC- 572 ) he presented at the Embedded Systems Conference.

< Previous
Page 2 of 2
Next >

Loading comments...