A topic that I find particularly interesting, which is raised by many embedded software developers whom I meet, is dynamic memory allocation – grabbing chunks of memory as and when you need them. This seemingly simple and routine operation opens up a huge number of problems. These are not confined to embedded development – many desktop applications exhibit memory leaks that impact performance and can make system reboots common. However, I am concerned about the embedded development context.
There are a number of reasons why malloc() is not generally recommended for embedded applications:
- The function is commonly not re-entrant (thread friendly), so using it with a real-time operating system may be challenging.
- Its performance is not deterministic (predictable), so the time taken to allocate a memory block may be very variable, which is a challenge in a real-time application.
- Memory allocation may fail.
Although these are all valid points, they may not all be as significant as they seem.
Reentrancy is only an issue if a function is called from multiple threads. It would be quite feasible to write a reentrant malloc() function, but it is also possible to use a standard version in a way that renders reentrancy unnecessary. Simply localize all memory allocation activities to a single task. You could even create a task the sole function of which is dynamic memory allocation; other tasks would simply send a message requesting allocation or deallocation of memory chunks.
Determinism is not always required. Not applications are real time and those that are do not necessarily require determinism for all parts of their operation.
Allocation failure can be a problem, but it can be managed. The malloc() function returns a null pointer if it cannot allocate the requested memory. It is essential to check for this response and take appropriate action. If the failure is due to memory exhaustion, there is most likely a design flaw – not enough memory was allocated to the heap. However, a common reason for allocation failure is heap fragmentation; enough free memory is available, but it is not in a contiguous area. This fragmentation comes about because of memory being allocated and deallocated in a random fashion resulting in allocated and free areas of memory. There are two ways to eliminate fragmentation:
First, if the application allows, it is only a matter of ensuring that allocation and deallocation is done in sequence with code that follows this kind of pattern:
a = malloc(1000); b = malloc(100); c = malloc(5000); ... free(c); free(b); free(a);
Of course, this may commonly be impossible. So, another option is needed.
It turns out that many applications do not need all the flexibility that malloc() affords them. The required memory chunks are of a fixed size (or a small number of different sizes). Writing a memory allocator for fixed-size blocks is very straightforward; this eliminates and fragmentation and can readily be made deterministic, if required. It is no surprise that most RTOSes have service calls to allocate memory blocks in this way.
Regardless of its unpredictability, there is another problem with malloc() – it tends to be rather slow. This is unsurprising, as the capabilities of the function are quite complex. The intrinsic simplicity of a block-based allocator addresses this issue very effectively.
But what if the application really does need random-sized chunks of memory at unpredictable times?
A way to achieve this flexibility, whilst avoiding fragmentation and non-determinism is to build an allocator that selects blocks from a number of “pools”, depending on the requested memory chunk size. A good approach to the selection of block sizes for the pools (if you have no prior knowledge of chunk sizes that will be demanded) is to use a geometric series, like 16, 32, 64, 128 bytes. Allocation would then work like this:
Clearly some allocations would be very efficient: 16 bytes from the 16-byte pool. Some would be very good; 31 bytes from the 32-byte pool. Others would be OK; 9 bytes from the 16-byte pool. Yet others would be inefficient; 65 bytes from the 128-byte pool. Overall, these inefficiencies are a small price to pay for the benefits of speed, determinism and elimination of fragmentation.
|Colin Walls has over forty years’ experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, Colin is an embedded software technologist with Mentor, a Siemens business, and is based in the UK. His regular blog is located at: https://blogs.sw.siemens.com/embedded-software/. He may be reached by email at email@example.com.|
- Firmware Security – Preventing memory corruption and injection attacks
- RTOS diagnostics and error checking
- A peek inside Amazon FreeRTOS: Communication and memory
- Achieving memory safety without compromise
- Memory plays vital role in security
- You think your software works? Prove it!
For more Embedded, subscribe to Embedded’s weekly email newsletter.