Mastering stack and heap for system reliability: Part 3 - Avoiding heap errors -

Mastering stack and heap for system reliability: Part 3 – Avoiding heap errors

Editor's note: In the last of a three part series, the authors deal with broad guidelines for proper heap design, methods to employ to avoid errors and provides some miscellaneous tips for general heap implementation success.

The heap stores dynamic elements generated by the code during runtime and allows different elements of the program to share data and memory. Underestimating heap usage can lead to out-of-memory errors in malloc(). This situation can be detected easily by checking the return status of malloc(), but by then it will be too late. This is a serious situation in most embedded systems because there is typically no acceptable way to recover, which means that the only available response is to restart the application. Overestimating heap usage is necessary due to the dynamic nature of the heap; on the other hand, too large an estimate will waste memory resources. Two other failures that can occur when using the heap are overwritten heap data (variables and pointers), and corruption of the heap’s internal structure.

Before we continue, we'll recap the dynamic memory API:

void* malloc(size_t size)

  • allocates size bytes
  • returns a pointer to the allocated memory block
  • does not clear the memory block
  • returns NULL if no memory is left

free(void* p)

  • frees the memory space pointed to by p
  • requires p to have been returned by a previous call to malloc(), calloc(), or realloc()
  • calling free(p) more than once for the same p must be avoided

void* calloc(size_t nelem, size_t elsize)

  • is similar to malloc()
  • clears the memory block

void* realloc(void* p, size_t size)

  • is similar to malloc()
  • grows/shrinks a previously allocated block
  • the returned block might have a new address


  • new operator similar to malloc()
  • new[]
  • delete operator similar to free()
  • delete[]

There are a number of dynamic memory allocator implementations available. The most commonly used today is Dlmalloc (Doug Lea’s memory allocator). Dlmalloc is found in Linux and also in many development tools for embedded systems. Dlmalloc is in the public domain and is freely available.

The internal structure for the heap is interspersed with data allocated by the application (Figure 1 ). If the application writes outside of the allocated data it might easily corrupt the internal structure of the heap.

Figure 1: Internal structure of the heap shows that writing outside of the allocated user data will corrupt the heap.

Calculating the amount of memory required for the heap is a non-trivial task. Most designers choose a trial and error approach just because the alternatives are too tedious. A typical approach would be to find the smallest heap for which the application still works and then increase that size by another 50%. If they are lucky, this works. If they are not, problems arise.

Mistakes to guard against
To reduce the risk of launching products with heap errors, programmers and code reviewers should be aware of several potential problem areas.

  • Initialization mistakes – Uninitialized global data is always initialized to zero. That well-known fact makes it easy to assume the same of the heap. Malloc(), realloc() and C++ new do not initialize the allocated data, although a special variant of malloc() called calloc() does initialize allocated data to zero. In C++ new , the appropriate constructor will be called, so make sure it initializes all elements.
  • Failure to distinguish between scalars and arrays – C++ has different operators for scalars and arrays: new and delete for scalars, new[] and delete[] for arrays.
  • Writing to already freed memory – This will either corrupt the internal memory allocator data structures or be overwritten later by a write to a subsequently legitimately allocated area. In any case, these errors are really hard to catch.
  • Failing to check return values – All of malloc(), realloc(), and calloc() return a NULL pointer to indicate an out-of-memory condition. Desktop systems will generate a memory fault when de-referencing the null pointer, which makes out-of-memory conditions easy to detect during development. Embedded systems may have flash at address zero and may survive with more subtle errors. If your MCU has a memory protection unit, you could configure it to generate a memory protection fault on write accesses to flash and other execute-only areas.
  • Freeing the same memory multiple times – This is likely to corrupt the internal memory allocator data structures, and is another error that can be challenging to detect.
  • Writing outside of the allocated area – This is likely to corrupt the internal memory allocator data structures, and is very difficult to discover.

The last three errors can be detected fairly easily if you put wrappers around the standard malloc(), free() and related functions. The wrappers must allocate a few extra bytes of memory to accommodate the extra information needed for the consistency checks.

One commonly used method is the “Magic Number” where a specially-chosen value is written at the top of the heap and is checked whenever memory is freed. If this value is not correct, then the heap has been corrupted (Figure 2 ).

Figure 2: Wrapper data layout shows a field size (bottom) used to calculate the magic number (top) that is used to detect corruption.

The size field below the data is used by the free() wrapper to find the ‘magic number’ (see sample code ). The wrapper example uses 8 bytes of overhead per allocation, which should be acceptable for most applications. The example also shows how to override the C++ new and delete operators globally.

#define MAGIC_NUMBER 0xefdcba98
uint32_t myMallocMaxMem;

void* MyMalloc(size_t bytes)
   uint8_t *p, *p_end;
   static uint8_t* mLow = (uint8_t*)0xffffffff; /* lowest address 
   returned by malloc() */
   static uint8_t* mHigh; /* highest address + data returned by
   malloc() */
   bytes = (bytes + 3) & ~3; /* ensure alignment for magic number */
   p = (uint8_t*)malloc(bytes + 8); /* add 2×32-bit for size and
   magic number */
   if (p == NULL)
      abort(); /* out of memory */
   *((uint32_t*)p) = bytes; /* remember size */
   *((uint32_t*)(p + 4 + bytes)) = MAGIC_NUMBER; /* write magic
   number after user allocation */
   /* crude method of estimating maximum used size since application
   start */
   if (p < mLow) mLow = p;
   p_end = p + bytes + 8;
   if (p_end > mHigh) mHigh = p_end;
   myMallocMaxMem = mHigh – mLow;
   return p + 4; /* allocated area starts after size */

void MyFree(void* vp)
   uint8_t* p = (uint8_t*)vp – 4;
   int bytes = *((uint32_t*)p);
   /* check that magic number is not corrupted */
   if (*((uint32_t*)(p + 4 + bytes)) != MAGIC_NUMBER)
      abort(); /* error: data overflow or freeing already freed
      memory */
   *((uint32_t*)(p + 4 + bytes)) = 0; /* remove magic number to be
   able to detect freeing already freed memory */

#ifdef __cplusplus
// global override of operator new, delete, new[] and delete[]
void* operator new (size_t bytes) { return MyMalloc(bytes); }
void operator delete (void *p) { MyFree(p); }

It is important to note that the example above will catch all such errors only if all allocated memory is also freed at some point. This may not be the case for some applications. In that case, the wrapper must maintain a list of all memory allocations and periodically validate all allocations recorded in the allocation list. The overhead of implementing this might not be as large as it sounds, though, since most embedded systems make relatively small use of dynamic memory, keeping the allocation list within a reasonable limit.

Setting the heap size
Now that we've reviewed challenges and pitfalls, let's consider techniques we can use to determine the minimum heap size needed by the application. The dynamic behavior and fragmentation that may occur make the task non-trivial. The approach recommended here is to run the application under a system test case that has been designed to force dynamic memory to be used as much as possible. It is important to repeatedly exercise transitions from low memory usage in order to see the effects of possible fragmentation. When the test case is completed, the maximum usage level of the heap should be compared to the actual heap size. A margin of 25 to 100% should be applied, depending on the nature of the application.

For systems that mimic desktop systems by emulating sbrk() , the maximum heap usage will be given by malloc_max_footprint() . For embedded systems that do not emulate sbrk() , it is common to give the entire heap to the memory allocator in one chunk. In that case, malloc_max_footprint() becomes worthless; it will simply return the size of the entire heap. One solution, for example, in the wrapper function described earlier, would be to call mallinfo() after every call to malloc() and observe the total allocated space (mallinfo->uordblks ). It is important to note that Mallinfo() is computationally intensive, however, and will impact performance. A better method involves recording the maximum distances between the allocated areas. This can be easily performed and is shown in the wrapper example; the maximum value is recorded in the variable myMallocMaxMem . This method works, provided that the heap is one contiguous memory area.

Tool support
When it comes to detecting heap-related problems on desktop systems, designers can choose from well-known analysis tools such as Purify, Insure++, and Valgrind. Tools of this class are not available for small embedded systems due to performance reasons. however. Designers either have to rely on the methods described above or they must replace the dynamic memory allocator with one of the variants designed for consistency analysis and debugging. One such debug implementation of a memory allocator is Dmalloc .

Properly setting up the stack in the heap is an essential part of developing a safe, reliable embedded system. Even if calculating the stack and heap requirements is a complex and difficult task, a number of useful tools and methods exist to make the process easier. The time and money spent on good calculation during the development phase is well rewarded by not having to find problems related to overwritten stack and heap areas in a deployed system. Ultimately the exercise provides bigger benefits in the form of having designers develop and commercialize successful embedded products.

Part 1: Calculating stack size
Part 2: Properly allocating stacks


1. Nigel Jones, blog posts at, 2007 and 2009.
2. John Regehr ,“Say no to stack overflow,” EE Times Design, 2004.
3. Carnegie Mellon University, “Secure Coding in C and C++, Module 4, Dynamic Memory Management,” 2010.

Anders Lundgren has been with IAR Systems since 1997. He currently works as product manager for the IAR Embedded Workbench for ARM. During the first years with IAR Systems he worked with compiler development and as project manager for compiler and debugger projects. Prior to joining IAR Systems, Lundgren worked with space science instruments at the European Space Agency and spent one year at the space science laboratory at the University of California, Berkeley. He received a M.Sc. in Computer Science from the University of Uppsala, Sweden in 1986.

Lotta Frimanson has been with IAR Systems since 1999. She currently works as product manager for IAR Embedded Workbench for ARM and MSP430, and is also responsible for the IAR RTOS partner program. Prior to joining IAR Systems, Frimanson worked with embedded systems development both at the bioscience company Biacore and at the consultant company Styrex. She received a M.Sc. in Engineering Physics from the University of Uppsala, Sweden in 1989.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.