Effective C++ Memory Allocation
Increasing numbers of programmers are choosing to write embedded systems using C++, for many reasons. C++ is an excellent language to use for object-oriented design. The advantages of object-oriented design are well known, and are just as important in embedded systems programming as more conventional programs. You can write C++ code that is just as efficient as C code, especially if you understand how the compiler translates C++ constructs into assembly language. Existing C code can be easily reused and linked with C++ code. For these and other reasons, the use of C++ will continue to increase in embedded systems. Although C++ has many advantages, its standard memory allocation can be problematic for embedded systems. Standard C++ memory allocation isn’t suitable for embedded systems; it’s suitable for conventional applications, such as the word processor I’m using to write this article. Within limits, a non-embedded system can assume that relatively large amounts of memory will be available, especially if running on an operating system with virtual memory. Wasting memory due to fragmentation is generally not an issue in conventional applications. Conventional applications must run fairly efficiently but generally do not have strictly deterministic execution. Embedded systems have very particular memory allocation needs. If an embedded system can use less memory, that often translates into lower product cost because less physical memory exists in the system. Some embedded systems must execute in deterministic time, while others may not be strictly deterministic but still must be relatively fast. Embedded systems can make specific assumptions about hardware, and it may be worthwhile to tune for them; for example, on some platforms an aligned memory access is significantly faster than an unaligned access. In some applications, certain areas of memory must be used for certain purposes; for instance, an area of memory is battery-backed, or is able to be to be accessed for DMA, or it’s preferable to use a certain area for frequently executed code because it’s faster. Limiting an object’s maximum allocation to a fixed amount of memory may be desirable. Finally, because a finite amount of memory exists, there must be a means to restart a thread when previously unavailable memory becomes available. Following are some techniques to use memory allocation effectively in C++. We developed these techniques for an embedded RAID controller running on a protected mode x86 processor; however, they should be applicable to any embedded system that supports C++. Several techniques that I present here may be used intact or piecemeal to overcome the deficiencies of C++. Overloading operators new and delete void *operator new( size_t size ); void operator delete ( void *); It’s interesting to note that these are void pointers. Operator new is called before any constructor is called, so it’s just returning a pointer to raw memory; likewise, operator delete is called after the object’s destructor has been called. The operators follow the normal rules of C++. Like the default new, the overloaded new is allowed to return zero if no memory is available, and operator delete must accept a zero pointer. The compiler will generate code that will check the return of operator new, and not call the constructor if it returns zero. As with any function, scope follows the normal inheritance, starting with the global operators new and delete. In this article, I discuss redefining new and delete on a class basis and not the global new and delete. Also, if the programmer allocates an array of an object, operator new is not called, but rather an operator new[] is called and operator delete[] is defined to free memory. The syntax can be found in any C++ text; it’s just important to know this detail. I won’t discuss the array operators in this article, but the principles are the same as the single element operator new and delete. Some run-time libraries for embedded systems do supply operators new and delete. The programmer may wish to disable them or modify their behavior, perhaps to guarantee that all allocation is done with operator overloading, or to allow the system to run without allocating memory for a heap. The global operators new and delete can be disabled by overloading them and calling assert. Another useful behavior is to make operator new an allocate-only operator new, where threads can allocate memory at initialization time. The allocate-only implementation can be as simple as incrementing a pointer through an array of memory allocated as a global object. Placement new #define NV_MEM_START 0x400000struct NV_DATA { int data; unsigned int moreData;};struct NV_DATA *nv;nv = (NV_DATA*) NV_MEM_START; C++ allows us to avoid the casting and to use normal syntax to access the special range of memory. To locate an object at a specified memory, we can use placement new, which is just a special case of using function overloading with operator new. Function overloading is a standard C++ feature, which allows functions of the same name to have different implementations, selected by matching parameter lists. The declaration for placement new is: void * operator new( size_t size, void *); Placement new is defined as: void * operator new(size_t size, void *location){ return location;} To declare an object at a given location, we used: const void * NV_MEM_START=0x400000;class NV_CLASS { int data; unsigned int moreData;public: void * operator new(size_t size, void *location);};a = new ( NV_MEM_START ) NV_CLASS; Objects obtained from placement new may only be constructed at initialization time and never be deleted, or they may need to be deleted and reused. In both of these cases, specifying an operator delete is useful. In the first case, where an object should never be deleted, an assert could be used to catch attempts at deletion, or it could be made scope private so the error could be caught at compile time. In the second case, in which the object may need to be reused, you really don’t need to do anything, and operator delete can just return. If an object obtained using placement new is going to be destructed and reused, it is important that the programmer-defined operator delete is there even if it simply returns. If it isn’t there, the global operator delete will be called, which will try to return memory that it did not allocate to the system. Keep in mind a couple of caveats about using placement new. If an object can be allocated with both placement new and the default new, the programmer must be careful about deleting the object; although there can be many different news using function overloading, there can only be one operator delete. If the object should only be allocated via placement new, attempts to use standard new can be caught at compile time by creating a private operator new with default arguments. Another use for placement new is memory-mapped I/O. In its simplest form, placement new for memory-mapped I/O would simply be an object created at the base of the memory range, and which will never be deleted. If a system has several identical devices, an individual object can be created for each location. It’s important to understand both the compiler’s packing of structure members and integer ordering issues. It is probably simplest to force the compiler to pack classes without padding and explicitly insert spacing where the hardware has unused bytes. Byte ordering is an issue when multi-byte registers use different byte ordering than the CPU, although for many cases the CPU and hardware will use the same byte ordering. Calling programmer-defined memory allocation void * operator new( size_t size ){ // call a user-defined // function to return memory return memMgr.GetMemory();} In this case, we also need an operator delete, to return memory to the memory manager: void operator delete(void *location){ // return memory to the user- // defined memory manager memMgr.ReleaseMemory(location);} I give an example later in this article of what the memory allocator might look like; for now, the important part is the syntax of calling a memory manager. Problems with inheritance class STUFF {public: void * operator new ( size_t size ) { // memMgr.GetMemory will // return an object of size // STUFF assert( size <= sizeof (STUFF); return memMgr.GetMemory(); } // other stuff in the class}; The memory manager In our project, we needed a relatively fast memory manager, and although running in constant time wasn’t important, our implementation does run in constant time. It was useful for us to explicitly control where memory came from, and to ensure that we would have adequate amounts of memory available for certain objects. Some of the memory had to come from pools of memory accessible via DMA, and some did not. The basic implementation was to create a resource manager, which has statically allocated pools of memory. The sizes of these pools are specified in the source file and each is used for a specific class (and its derived classes). Each element has an embedded resource information object. The object is public data in the class. For example: class RESOURCE_INFO { LIST_OBJECT ll; // to place // the object on a linked // list unsigned userId; // if // reserved to a specific // user}; At initialization, the resource manager builds linked lists of the objects for which it’s controlling memory allocation. At allocation, the resource manager attempts to take one off the head of the linked list; at deletion, the resource manager places the object at the end of the linked list. This implementation is very basic; for debugging, we added counters that show the number of blocks available and consumed. This tactic is very useful for debugging; for instance, in a hang situation where one pool has no memory free, perhaps the callback (see below) isn’t working. It’s also useful to: keep statistics on the maximum number of a certain object run; run the system lightly loaded, with an average load, and heavily loaded; and compare performance vs. the number of objects allocated. While I don’t show code for this, it’s a good way to understand system resource usage. Handling callbacks In our RAID controller, we had enough memory available so that a normal I/O request could complete without waiting for resources. However, the worst-case workload required another order of magnitude of physical memory, which wasn’t practical to put on board. To handle this unlikely worst case, we coded most calls to operator new to accept not being able to get resources and added the callback for when resources were freed. The method we used for callbacks was to pass in a pointer to an object as an optional parameter to operator new. The default value was zero. If this default zero was used, and memory allocation failed, the memory manager would have no record of the request. If the requestor desired to be called back when memory allocation failed, it passed in a pointer to an object derived from the ACTOR class. The ACTOR is a linked list object used for storage and a virtual function to call when the memory request should be retried. For example: class ACTOR {public: // linked list element to // store this request LIST_OBJECT linkElement; virtual void WakeupResource(void);}; When the memory manager did not have memory available for the particular resource, it would link the ACTOR on a queue of ACTORs waiting for a particular resource to become available. When the resource did become available, the resource manager would call the WakeupRe-source member function for any waiting ACTORs. The WakeupResource member function would then retry the memory allocation. Reserving resources In another example, for performance reasons it was desirable to guarantee that a certain thread would never wait for a resource. We could calculate the worst-case number of objects necessary for processing and reserve them to that thread. In another application, it was difficult to save state for a callback to resume processing, but it was possible to predict the worst-case number of resources necessary for processing and reserve those before execution. For these reasons, we extended the resource manager to allow resources to be reserved to a resource ID. At initialization, a thread could obtain a resource ID. If desired, the thread could reserve a certain number of resources of a certain type for use only by that resource ID. When it requested a resource, the resource manager would first check to see if any resources were reserved for that ID; if they were, the resource manager would allocate the resources from that pool. If that pool were empty, it would attempt to allocate resources from a general pool. If that pool were unavailable, then the resource allocation would fail and zero would be returned. A toolkit of techniques Aaron Dailey works at Chaparral Technologies in Longmont, CO, on their external RAID controllers. He has been working with embedded systems in one form or another for 10 years, and using C++ for the last three. He can be reached via e-mail at adailey@chaparraltec.com. See
|