Use malloc()? Why not? -

Use malloc()? Why not?

A topic that I find particularly interesting, which is raised by many embedded software developers whom I meet, is dynamic memory allocation – grabbing chunks of memory as and when you need them. This seemingly simple and routine operation opens up a huge number of problems. These are not confined to embedded development – many desktop applications exhibit memory leaks that impact performance and can make system reboots common. However, I am concerned about the embedded development context.

There are a number of reasons why malloc() is not generally recommended for embedded applications:

  • The function is commonly not re-entrant (thread friendly), so using it with a real-time operating system may be challenging.
  • Its performance is not deterministic (predictable), so the time taken to allocate a memory block may be very variable, which is a challenge in a real-time application.
  • Memory allocation may fail.

Although these are all valid points, they may not all be as significant as they seem.

Reentrancy is only an issue if a function is called from multiple threads. It would be quite feasible to write a reentrant malloc() function, but it is also possible to use a standard version in a way that renders reentrancy unnecessary. Simply localize all memory allocation activities to a single task. You could even create a task the sole function of which is dynamic memory allocation; other tasks would simply send a message requesting allocation or deallocation of memory chunks.

Determinism is not always required. Not applications are real time and those that are do not necessarily require determinism for all parts of their operation.

Allocation failure can be a problem, but it can be managed. The malloc() function returns a null pointer if it cannot allocate the requested memory. It is essential to check for this response and take appropriate action. If the failure is due to memory exhaustion, there is most likely a design flaw – not enough memory was allocated to the heap. However, a common reason for allocation failure is heap fragmentation; enough free memory is available, but it is not in a contiguous area. This fragmentation comes about because of memory being allocated and deallocated in a random fashion resulting in allocated and free areas of memory. There are two ways to eliminate fragmentation:

First, if the application allows, it is only a matter of ensuring that allocation and deallocation is done in sequence with code that follows this kind of pattern:

a = malloc(1000);
b = malloc(100);
c = malloc(5000);

Of course, this may commonly be impossible. So, another option is needed.

It turns out that many applications do not need all the flexibility that malloc() affords them. The required memory chunks are of a fixed size (or a small number of different sizes). Writing a memory allocator for fixed-size blocks is very straightforward; this eliminates and fragmentation and can readily be made deterministic, if required. It is no surprise that most RTOSes have service calls to allocate memory blocks in this way.

Regardless of its unpredictability, there is another problem with malloc() – it tends to be rather slow. This is unsurprising, as the capabilities of the function are quite complex. The intrinsic simplicity of a block-based allocator addresses this issue very effectively.

But what if the application really does need random-sized chunks of memory at unpredictable times?

A way to achieve this flexibility, whilst avoiding fragmentation and non-determinism is to build an allocator that selects blocks from a number of “pools”, depending on the requested memory chunk size. A good approach to the selection of block sizes for the pools (if you have no prior knowledge of chunk sizes that will be demanded) is to use a geometric series, like 16, 32, 64, 128 bytes. Allocation would then work like this:

Clearly some allocations would be very efficient: 16 bytes from the 16-byte pool. Some would be very good; 31 bytes from the 32-byte pool. Others would be OK; 9 bytes from the 16-byte pool. Yet others would be inefficient; 65 bytes from the 128-byte pool. Overall, these inefficiencies are a small price to pay for the benefits of speed, determinism and elimination of fragmentation.

Colin Walls has over forty years’ experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, Colin is an embedded software technologist with Mentor, a Siemens business, and is based in the UK. His regular blog is located at: He may be reached by email at

Related Contents:

For more Embedded, subscribe to Embedded’s weekly email newsletter.

1 thought on “Use malloc()? Why not?

  1. Interesting post, however your pool scheme does not solve the case where allocation sizes tend to grow regularly up to a given size, as smaller chunks are not coalesced once freed. A better alternative, especially if sticking to power of two block siszes would be to use a Buddy allocator, as it tends to merge small blocks when they are freed.

    By the way, the buddy algorithm, combined, for small sizes, with a pool algorithm proposing non-power of two sizes (always multiple of the cache line size) is what is used very effectively by the Linux kernel to handle unpredictible allocations in a fast and reliable way.

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.