The stack is an area of system RAM dedicated to storing temporary variables and program data during the execution of code blocks. This is static memory that operates on a last-in, first-out basis.
A number of factors combine to make it difficult to calculate maximum stack usage. Many applications are complex and event driven, consisting of hundreds of functions and many interrupts. There interrupt functions can be triggered at any time and, if they are allowed to be nested, it becomes even more difficult to calculate the required stack size. There might be indirect calls using function pointers where the destination of the call could be a number of different functions. Recursion and un-annotated assembly routines will also cause problems for anyone who wants to calculate the maximum stack usage. Ergo, there is not a natural execution flow that can be easily followed.
Many microcontrollers implement multiple stacks, for example a system stack and a user stack. Multiple stacks are also a reality if you use an embedded RTOS like µC/OS, ThreadX, and others, in which case each task has its own stack area. Runtime libraries and third-party software are other factors that complicate the calculation because the source code for libraries and the RTOS may not be available.
It is also important to remember that changes to the code and the scheduling of the application can have a large impact on the stack usage. Different compilers and different optimization levels also generate different code that uses different amounts of stack. All of these factors make it essential to continuously track the maximum stack requirement.
Setting stack size
If you're designing an application, the stack size requirement is one factor that you must consider during the design phase, which means that you need a method for determining the amount of stack that you require. Allocating too much stack will waste RAM, but sometimes even if you allocate the entire remaining RAM for stack usage, you still cannot be sure it will be enough.
One approach is to test the system under conditions that produce worst-case stack behavior. During these tests you will need a method for finding out how much stack has been used. This can be done in two ways: from printouts of the current stack usage or by making sure that you can find traces of stack usage in memory after your test run has been completed. Worst case conditions are very hard to provoke in complex systems, however. Moreover, one fundamental problem with testing an event-driven system with many interrupts is that it is likely that some execution paths will never be tested.
Another approach is to calculate the theoretical maximum stack requirement. It’s easy to understand that calculating stack usage for a complete system manually is exceedingly difficult. Quickly and accurately performing this calculation requires a tool that can analyze the complete system. The tool must operate on either the binary image or the source code. A binary tool works at the machine instruction level to find all possible movements of the program counter through your code in order to find the worst-case execution path. A source code static analysis tool reads all of the compilation units of the application. In both cases, the tool must be able to determine direct function calls and indirect function calls through pointers in the compilation unit, and therefore compute a conservative stack usage profile across the entire system for all call graphs. The source code tool has to take into account the demands the compiler places on the stack, such as alignments and compiler temporaries; this can be done by the tool examining the object/executable code.
Writing this kind of tool yourself is a difficult task. Commercial alternatives exist, in the form of either stand-alone static stack calculation tools like PC-Lint, or those provided by a solution vendor, like the StackX stack calculation tool that is available from Express Logic. Compiler and linker also have the information needed to calculate the maximum stack requirement. This functionality is available, for example, in the IAR Embedded Workbench for ARM.
Estimating stack depth
One way of calculating the stack depth is to use the address of the current stack pointer. This requires simply taking the address of a function's argument or local variable. If this is done in the beginning of the main function and for each of the functions that you think are using the most stack, you can calculate the amount of stack your application requires.
Here is an example where we assume that the stack is growing from high to low addresses:
char *highStack, *lowStack;
int main(int argc, char *argv)
highStack = (char *)&argc;
printf(“Current stack usage: %dn”, highStack – lowStack);
lowStack = (char *)&a;
This method can yield good results in small and deterministic systems. It is not perfect, however. For many systems it can be difficult to determine the nested function with the deepest stack usage and to provoke the worst case situation. In addition, the results obtained with this method do not take into account stack usage by interrupt functions.
A variant of this approach involves periodically sampling the stack pointer using a high frequency timer interrupt. The interrupt frequency should be set as high as possible without impacting the real-time performance of the application. Typical frequencies would be in the range of 10 to 250 kHz. The advantage of this variant is that there is no need to manually find the function with the deepest stack usage. It is also possible to find stack usage by interrupt functions if the sampling interrupt is allowed to preempt other interrupts. Care should be taken, however, as interrupt functions tend to be short in duration and might be missed by the sampling interrupt.
currentStack = (char *)&a;
if (currentStack < lowStack) lowStack = currentStack;
The stack guard zone
A stack guard zone is a memory area allocated just below the stack, where the stack leaves traces if it overflows (see Figure 1 ). This method is always implemented on desktop systems where the operating system can easily be set up to detect memory protection errors for a stack overflow situation. On a small embedded system without MMU, a guard zone can still be inserted that will be quite useful. For a guard zone to be effective, it must be of a reasonably large size in order to catch writes to the guard zone.
Figure 1: Located just below the stack and memory, the guard zone captures traces of stack overflow.
The consistency checking of the guard zone can be made in software by regularly checking that the guard zone fill pattern is intact.
A better method can be implemented if the MCU is equipped with a memory protection unit. In that case, the memory protection unit can be set up to trigger on writes to the guard zone. If an access occurs, an exception will be triggered, and the exception handler can record what happened for later analysis.
Filling the stack area with a dedicated pattern
One technique to detect stack overflow is to fill the entire amount of memory allocated to the stack area with a dedicated fill value, for example 0xCD, before the application starts executing. Whenever the execution stops, the stack memory can be searched upwards from the end of the stack until a value that is not 0xCD is found, which is assumed to be how far the stack has been used. If the dedicated value cannot be found, the stack has consumed all stack space and most likely has overflowed.
Although this is a reasonably reliable way to track stack usage, there is no guarantee that it will detect a stack overflow. For example, a stack can incorrectly grow outside its bounds, and even modify memory outside the stack area, without actually modifying any of the bytes near the stack range. Likewise, your application might modify memory within the stack area by mistake.
This method of monitoring stack usage is commonly used by debuggers. This means that the debugger can display a graphical representation of the stack usage like that shown in Figure 1. The debugger does normally not detect a stack overflow when it happens; it can only detect the signs it leaves behind.
Linker-calculated maximum stack requirement
We will now take a closer look at the way build tools like compilers and linkers can calculate maximum stack requirements, using the IAR Embedded Workbench as an example (Figure 2 ). The linker can accurately calculate the maximum stack usage for each call graph root (each function that is not called from another function, like the start of the application), but it may require some input from the developer because this is only accurate if there is accurate stack usage information for each function in the application, of course. Stack usage provided for the depth of recursive functions and nested interrupts, among other things, is difficult to determine at compile-time.
Click on image to enlarge.
Figure 2: The IAR Embedded Workbench leverages the dedicated pattern technique to track stack calls for calculating maximum stack usage.
In general, the compiler will generate this information for each C function, but in some situations you must provide stack-related information to the system. For example, if there are indirect calls (calls using function pointers) in your application, you must supply a list of possible functions that can be called from each calling function. You can do this by using pragma directives in the source file, or by using a separate stack usage control file when linking.
#pragma calls = fun1, fun2, fun3
If you use a stack usage control file, you can also supply stack usage information for functions in modules that do not have stack usage information.
The linker will also generate warnings if some necessary information is missing, for example under the following circumstances:
- There is at least one function without stack usage information
- There is at least one indirect call site in the application for which a list of possible called functions has not been supplied
- There are no known indirect calls, but there is at least one uncalled function that is not known to be a call graph root
- The application contains recursion (a cycle in the call graph)
- There are calls to a function declared as a call graph root
When stack usage analysis is enabled, a stack usage chapter will be added to the linker map file, which provides a call chain for each call graph root and subsequently the maximum stack depth for each root.
Click on image to enlarge.
The total maximum stack usage for the complete system is calculated by adding together the result for each call graph root. In this analysis, the maximum possible stack usage will be:
It is important to remember that this type of stack usage analysis produces a worst case result. The application might not actually ever end up in the maximum call chain, by design or by coincidence.
Nigel Jones, Embedded Gurus Blog: “Computing your stack size” .
John Regehr ,“Say no to stack overflow,” EE Times Design, 2004.
Carnegie Mellon University, “Secure Coding in C and C++, Module 4, Dynamic Memory Management,” 2010.
Anders Lundgren has been with IAR Systems since 1997. He currently works as product manager for the IAR Embedded Workbench for ARM. During the first years with IAR Systems he worked with compiler development and as project manager for compiler and debugger projects. Prior to joining IAR Systems, Lundgren worked with space science instruments at the European Space Agency and spent one year at the space science laboratory at the University of California, Berkeley. He received a M.Sc. in Computer Science from the University of Uppsala, Sweden in 1986.
Lotta Frimanson has been with IAR Systems since 1999. She currently works as product manager for IAR Embedded Workbench for ARM and MSP430, and is also responsible for the IAR RTOS partner program. Prior to joining IAR Systems, Frimanson worked with embedded systems development both at the bioscience company Biacore and at the consultant company Styrex. She received a M.Sc. in Engineering Physics from the University of Uppsala, Sweden in 1989.