Effective C++ Memory Allocation - Embedded.com

Effective C++ Memory Allocation

Despite the advantages of C++, its standard memory allocation can give embedded systems fits. Using several features of the language, the author presents a framework for resource allocation which is temporally deterministic, provides for callback, provides memory pools, and can provide for deadlock prevention.

Increasing numbers of programmers are choosing to write embedded systems using C++, for many reasons. C++ is an excellent language to use for object-oriented design. The advantages of object-oriented design are well known, and are just as important in embedded systems programming as more conventional programs. You can write C++ code that is just as efficient as C code, especially if you understand how the compiler translates C++ constructs into assembly language. Existing C code can be easily reused and linked with C++ code. For these and other reasons, the use of C++ will continue to increase in embedded systems.

Although C++ has many advantages, its standard memory allocation can be problematic for embedded systems. Standard C++ memory allocation isn’t suitable for embedded systems; it’s suitable for conventional applications, such as the word processor I’m using to write this article. Within limits, a non-embedded system can assume that relatively large amounts of memory will be available, especially if running on an operating system with virtual memory. Wasting memory due to fragmentation is generally not an issue in conventional applications. Conventional applications must run fairly efficiently but generally do not have strictly deterministic execution.

Embedded systems have very particular memory allocation needs. If an embedded system can use less memory, that often translates into lower product cost because less physical memory exists in the system. Some embedded systems must execute in deterministic time, while others may not be strictly deterministic but still must be relatively fast. Embedded systems can make specific assumptions about hardware, and it may be worthwhile to tune for them; for example, on some platforms an aligned memory access is significantly faster than an unaligned access. In some applications, certain areas of memory must be used for certain purposes; for instance, an area of memory is battery-backed, or is able to be to be accessed for DMA, or it’s preferable to use a certain area for frequently executed code because it’s faster. Limiting an object’s maximum allocation to a fixed amount of memory may be desirable. Finally, because a finite amount of memory exists, there must be a means to restart a thread when previously unavailable memory becomes available.

Following are some techniques to use memory allocation effectively in C++. We developed these techniques for an embedded RAID controller running on a protected mode x86 processor; however, they should be applicable to any embedded system that supports C++. Several techniques that I present here may be used intact or piecemeal to overcome the deficiencies of C++.

Overloading operators new and delete
The feature of C++ that makes redefining memory allocation possible is operator overloading of the new and delete operators. While programmers often think of using operator overloading to provide class-specific functionality for arithmetic operators like + and -, operator overloading can be used to replace default memory allocation behavior. The syntax for overloading memory allocation is similar to arithmetic operations:

void *operator new( size_t size ); void operator delete ( void *);

It’s interesting to note that these are void pointers. Operator new is called before any constructor is called, so it’s just returning a pointer to raw memory; likewise, operator delete is called after the object’s destructor has been called.

The operators follow the normal rules of C++. Like the default new, the overloaded new is allowed to return zero if no memory is available, and operator delete must accept a zero pointer. The compiler will generate code that will check the return of operator new, and not call the constructor if it returns zero. As with any function, scope follows the normal inheritance, starting with the global operators new and delete. In this article, I discuss redefining new and delete on a class basis and not the global new and delete. Also, if the programmer allocates an array of an object, operator new is not called, but rather an operator new[] is called and operator delete[] is defined to free memory. The syntax can be found in any C++ text; it’s just important to know this detail. I won’t discuss the array operators in this article, but the principles are the same as the single element operator new and delete.

Some run-time libraries for embedded systems do supply operators new and delete. The programmer may wish to disable them or modify their behavior, perhaps to guarantee that all allocation is done with operator overloading, or to allow the system to run without allocating memory for a heap. The global operators new and delete can be disabled by overloading them and calling assert. Another useful behavior is to make operator new an allocate-only operator new, where threads can allocate memory at initialization time. The allocate-only implementation can be as simple as incrementing a pointer through an array of memory allocated as a global object.

Placement new
In one application, we had one range of battery-backed nonvolatile memory, which would preserve data across power failure. Because this memory was too slow for execution, we executed code from another range of memory. To use the battery-backed memory for data that needed to be preserved across power failures, we had to explicitly place objects there. Using C, we would have probably cast the address of the memory to a structure pointer to access the memory, perhaps:

#define NV_MEM_START 0x400000struct NV_DATA {   int data;   unsigned int moreData;};struct NV_DATA *nv;nv = (NV_DATA*) NV_MEM_START;

C++ allows us to avoid the casting and to use normal syntax to access the special range of memory. To locate an object at a specified memory, we can use placement new, which is just a special case of using function overloading with operator new. Function overloading is a standard C++ feature, which allows functions of the same name to have different implementations, selected by matching parameter lists. The declaration for placement new is:

void * operator new( size_t size, void *);

Placement new is defined as:

void * operator new(size_t size, void *location){   return location;}

To declare an object at a given location, we used:

const void * NV_MEM_START=0x400000;class NV_CLASS {   int data;   unsigned int moreData;public:   void * operator new(size_t      size, void *location);};a = new ( NV_MEM_START ) NV_CLASS;

Objects obtained from placement new may only be constructed at initialization time and never be deleted, or they may need to be deleted and reused. In both of these cases, specifying an operator delete is useful. In the first case, where an object should never be deleted, an assert could be used to catch attempts at deletion, or it could be made scope private so the error could be caught at compile time. In the second case, in which the object may need to be reused, you really don’t need to do anything, and operator delete can just return. If an object obtained using placement new is going to be destructed and reused, it is important that the programmer-defined operator delete is there even if it simply returns. If it isn’t there, the global operator delete will be called, which will try to return memory that it did not allocate to the system.

Keep in mind a couple of caveats about using placement new. If an object can be allocated with both placement new and the default new, the programmer must be careful about deleting the object; although there can be many different news using function overloading, there can only be one operator delete. If the object should only be allocated via placement new, attempts to use standard new can be caught at compile time by creating a private operator new with default arguments.

Another use for placement new is memory-mapped I/O. In its simplest form, placement new for memory-mapped I/O would simply be an object created at the base of the memory range, and which will never be deleted. If a system has several identical devices, an individual object can be created for each location. It’s important to understand both the compiler’s packing of structure members and integer ordering issues. It is probably simplest to force the compiler to pack classes without padding and explicitly insert spacing where the hardware has unused bytes. Byte ordering is an issue when multi-byte registers use different byte ordering than the CPU, although for many cases the CPU and hardware will use the same byte ordering.

Calling programmer-defined memory allocation
We had other cases in which we needed to control memory allocation explicitly for specific objects. We knew how we wanted memory allocation to work. We knew how many objects of a given type we needed and how the allocator should behave. We used overloading of the operator new to call a user-defined allocator. The syntax for this approach is:

void * operator new( size_t size ){   // call a user-defined        // function to return memory   return memMgr.GetMemory();}

In this case, we also need an operator delete, to return memory to the memory manager:

void operator delete(void *location){   // return memory to the user-       // defined memory manager   memMgr.ReleaseMemory(location);}

I give an example later in this article of what the memory allocator might look like; for now, the important part is the syntax of calling a memory manager.

Problems with inheritance
The operator overloading I’ve described allows memory to be stored in regions, so that there is one pool of fixed-size pieces of memory for all of a certain class. However, there is a problem: operators new and delete are inherited, just as any other member function would be. The problem with inheritance is that derived classes may be larger than the base class. One solution is to use a different memory pool for each derived class. Another option is to use one pool for a base class and all derived classes, and allocate memory for the largest derived class. Either way, it is useful to have an assert in operator new to verify that the requested size is less than or equal to the allocated size. Even in cases in which there are currently no derived objects, an assert can protect against later derivations of the object that don’t update the memory manager. The compiler will pass in the amount of memory it expects into operator new; when a derived object grows larger, the compiler will pass in the larger size. Note the following example:

class STUFF {public:   void * operator new ( size_t      size )   {     // memMgr.GetMemory will      // return an object of size            // STUFF     assert( size <= sizeof (STUFF);     return memMgr.GetMemory();   }   // other stuff in the class};

The memory manager
The programmer must determine how the memory manager should behave in the embedded system. For example, must the memory manager run in constant time? How much empty memory can fragmentation consume? Should memory be used from one pool, or should specific amounts of memory be allocated to specific objects? Should the pools reside in different physical memory areas?

In our project, we needed a relatively fast memory manager, and although running in constant time wasn’t important, our implementation does run in constant time. It was useful for us to explicitly control where memory came from, and to ensure that we would have adequate amounts of memory available for certain objects. Some of the memory had to come from pools of memory accessible via DMA, and some did not.

The basic implementation was to create a resource manager, which has statically allocated pools of memory. The sizes of these pools are specified in the source file and each is used for a specific class (and its derived classes). Each element has an embedded resource information object. The object is public data in the class. For example:

class RESOURCE_INFO {   LIST_OBJECT ll;     // to place          // the object on a linked          // list   unsigned  userId;     // if          // reserved to a specific          // user};

At initialization, the resource manager builds linked lists of the objects for which it’s controlling memory allocation. At allocation, the resource manager attempts to take one off the head of the linked list; at deletion, the resource manager places the object at the end of the linked list.

This implementation is very basic; for debugging, we added counters that show the number of blocks available and consumed. This tactic is very useful for debugging; for instance, in a hang situation where one pool has no memory free, perhaps the callback (see below) isn’t working. It’s also useful to: keep statistics on the maximum number of a certain object run; run the system lightly loaded, with an average load, and heavily loaded; and compare performance vs. the number of objects allocated. While I don’t show code for this, it’s a good way to understand system resource usage.

Handling callbacks
One issue of memory allocation to consider is what happens when no memory is available. Sometimes if memory isn’t available, a function may continue with the resources that it has currently obtained. Other times, a function can’t do its job without obtaining the resources it has requested. If the function cannot complete without the resource, it must be restarted when the resource becomes available.

In our RAID controller, we had enough memory available so that a normal I/O request could complete without waiting for resources. However, the worst-case workload required another order of magnitude of physical memory, which wasn’t practical to put on board. To handle this unlikely worst case, we coded most calls to operator new to accept not being able to get resources and added the callback for when resources were freed.

The method we used for callbacks was to pass in a pointer to an object as an optional parameter to operator new. The default value was zero. If this default zero was used, and memory allocation failed, the memory manager would have no record of the request. If the requestor desired to be called back when memory allocation failed, it passed in a pointer to an object derived from the ACTOR class. The ACTOR is a linked list object used for storage and a virtual function to call when the memory request should be retried. For example:

class ACTOR {public:   // linked list element to    // store this request   LIST_OBJECT linkElement;    virtual void WakeupResource(void);};

When the memory manager did not have memory available for the particular resource, it would link the ACTOR on a queue of ACTORs waiting for a particular resource to become available. When the resource did become available, the resource manager would call the WakeupRe-source member function for any waiting ACTORs. The WakeupResource member function would then retry the memory allocation.

Reserving resources
In general, we coded our resource manager to allow allocation of a resource by any threads that requested it. However, it can be useful to reserve a certain number of resources that can be allocated only by one thread. One reason to do so is deadlock avoidance. In one piece of our code, if all resources were in use, and a certain thread could obtain one more of a certain resource, it would be guaranteed to free up additional resources so that the system could continue.

In another example, for performance reasons it was desirable to guarantee that a certain thread would never wait for a resource. We could calculate the worst-case number of objects necessary for processing and reserve them to that thread. In another application, it was difficult to save state for a callback to resume processing, but it was possible to predict the worst-case number of resources necessary for processing and reserve those before execution.

For these reasons, we extended the resource manager to allow resources to be reserved to a resource ID. At initialization, a thread could obtain a resource ID. If desired, the thread could reserve a certain number of resources of a certain type for use only by that resource ID. When it requested a resource, the resource manager would first check to see if any resources were reserved for that ID; if they were, the resource manager would allocate the resources from that pool. If that pool were empty, it would attempt to allocate resources from a general pool. If that pool were unavailable, then the resource allocation would fail and zero would be returned.

A toolkit of techniques
C++ has many powerful features that make it uniquely suited for embedded systems. One powerful feature of C++ is extending default functionality. Through extending default functionality, memory management can be extended and customized to suit the exact needs of the particular embedded system. Some examples of extending default memory management are placing objects in particular memory locations and allowing the user to control exactly how memory is allocated, how many of a certain resource is allocated, and how callbacks and reservations are handled. With this toolkit of memory allocation techniques, C++ becomes more usable for embedded systems.

Aaron Dailey works at Chaparral Technologies in Longmont, CO, on their external RAID controllers. He has been working with embedded systems in one form or another for 10 years, and using C++ for the last three. He can be reached via e-mail at adailey@chaparraltec.com. See

PDF of original article

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.