Intel's 4004, the first microprocessor, had an on-chip hardware stack a mere three levels deep. When a year later the 8008 came out, it also had a hardware stack, boosted to a depth of seven words. Since neither had push and pop instructions the stack contained only return addresses from CALL instructions.
No, interrupts did not necessarily eat stack space. An interrupt cycle initiated a nearly-normal fetch. It was up to external hardware to jam an instruction onto the bus. Though most designers jammed an RST, a single-byte CALL, more than a few used alternatives like JMP.
With these tiny stacks developers constantly worried about deep call trees, for there were no tools to trace program execution. But programs were small. After getting burned once one learned to sketch simple execution path diagrams to track call depth.
Even today we have only the most limited ways to troubleshoot a blown stack. Now of course every processor has constantly-used PUSH instructions. Million line programs with explosive numbers of automatic variables and intricate inheritance schemes chew through stack space while masking the all of the important usage details. Add nested interrupt processing asynchronously and unpredictably using even more, and we're faced with a huge unknown: how big should the stack or stacks be?
Most developers use the scientific method to compute anticipated stack requirements. We take a wild guess.
I've read more than a few academic papers outlining alternative approaches that typically analyze the compiler's assembly listings. But without complete knowledge of the runtime library's requirements, and that of other bits of add-in software like communications packages, these ideas collapse.
Today wise developers monitor stack consumption using a variety of methods, like filling a newly-allocated stack with a pattern (I like 0xDEAD) and keeping an eye on how much of that gets overwritten. Some RTOSes include provisions to log high-water marks.
But these are all ultimately reactive. We take a stack-size guess and then start debugging. When something goes wrong, change the source code to increase the stack's size. It violates my sense of elegance, paralleling as it does the futile notion of testing quality into a product. It's better to insert correctness from the outset.
AdaCore has a new tool that almost does this. It takes extant code (unfortunately, for Ada only) and using proprietary information generated by their compiler generates a somewhat complex assessment of required stack space. The tool can't know everything, but does offer a first or second level approximation that has to be very useful.
The tool produces metrics that are a step closer to provably correct software, and a big jump away from the notion of "hey, it seems to work. Ship it."
What do you think? Have you used a tool like this? How do you manage stack size?
Jack G. Ganssle is a lecturer and consultant on embedded
development issues. He conducts seminars on embedded systems and helps
companies with their embedded challenges. Contact him at email@example.com. His website is www.ganssle.com.
In general; I have found stack/heap corruption to be the more common problem than stack size. The problem manifests as follows:
1. Developer defines an array on the local variable namespace, (stack)
2. Developer inadvertently does a memcpy, or copy operation, or fill operation that overruns the local variable.
3. Subroutine, as tested works fine
4. However ... upon returning from subroutine, system crashes because the saved return value, (on stack of course) is corrupted!
5. Development team now spends many hours trying to 'trace' this problem only to find problem nearly untraceable because of the ensuing stack corruption
- Ken Wada
I was so paranoid on one project I took a few days out to write a tool in C++Builder that analysed the .LST files the compiler generated. This produced a call tree, starting from main(), and totted up all the PUSH and JSR instructions to get the deepest path. Then do the same thing on the ISRs, and its possible to get a pretty accurate idea.
The next project I did used a different compiler and I just couldn't face rewriting the tool for it so I've not done it again!
- Paul Hills
While there may be no practical difference (since the gnatstack tool is not currently publicly distributed), the statement that the tool uses 'using proprietary information generated by their compiler" is not really correct given that the AdaCore compiler is distributed under the terms of the GPL. The output of the compiler probably can't be considered proprietary.
- Jeffrey Creem
I've found this three-staged approach to be the most effective:
1. Static Analysis - Use function call tree generators like GNU's cflow or the one that comes with your tool chain to find out about the calling hierarchy and focus on the 'critical path'; that is, the deepest call chain. Add the most stack consuming run-time lib routine and do not forget to add the stack consumption of any of the hand-written assembly language functions that you may be calling on the critical path; do not forget the stack memory consumed by your most demanding interrupt/exception handler.
2. Dynamic Analysis - Setup a huge stack and fill it with a known pattern, e. g. '0xdead', as suggested by Jack, and perform stress testing. Check how much of the stack has been used.
Take the maximum of these two numbers, add an 'appropriate'
safety-cushion; for most applications, 20% will suffice.
In addition, you need:
3. Run-time checks - For platforms with Memory Management Units (MMU's) that's easy: just configure a small 'No man's Land' area around your stack; any access will cause an exception that allows you to deal with this situation gracefully. If no MMU is available, have a timer ISR (or some regularly executing code like a message dispatcher) check, if someone's stepped onto the 'No man's Land'.
- Ralf Holly
In one project, after getting burned by stack overflow, we took a similar approach to Paul Hills's. We wrote a script to parse the assembly code outputs from the compiler to follow the function call tree and track the stack pointer manipulation. We'd then add a margin to the measured worst-case stack size, and compare against the configured stack size. We hooked this into our build tool, so the build would ultimately fail if the stack were too small. One obstacle was the use of function pointers. We resorted to a hand-coded table indicating which functions a function pointer might be set to. (It would be nicer to have the script parse the source code to determine possible values for the function pointer, but the quick and dirty way worked fine for our purposes.)
- Doug Dahlby
We were in a situtation to calculate the amount of stack our firmware would use. What we used to do is to write OxFF(as Jack suggested) in a function which will call even before the main() called. Then we loaded the software with maximum function calls that would be possible.Result we could get the amount of stack used .But we are nowhere sure about the exact amount of stack which will be used when the firmware is running.(we were doing the above on emulator) What you dudes thinking?
- Karthik Somsunder
Just an FYI: Hardware-based stacks with no PUSH/POP capability still exist in the 12- and 14-bit cores in the Microchip PIC Family. And it's amazing what you can still do with them!
- John Patrick
Stack management, being one of the thornier problems in embedded systems development, is something that my company has focused on for many years. The MULTI IDE has a built-in stack analysis function, similar to that described in this article, which attempts to compute the worst case stack size for your applications - and this computation is language-independent. Of course, static analysis is only going to work some of the time, so you need some run-time tools as well.
There are a number of such techniques. For example, the Green Hills compiler can instrument low-overhead run-time checks for stack overflows into the code.
When using our INTEGRITY real-time operating system, stack segments are "colored" with a special pattern so that our debugger and resource analysis tools can report to the developer stack high water mark metrics while interacting with the system.
We also take advantage of INTEGRITY's protected virtual memory capabilities to add guard pages below the stacks of threads so that an overflow generates an exception instead of subtle corruptions that are so difficult to track down. Developers can even add stack fault recovery functionality to a running system to improve system availability.
So, in summary: yes, this is a big problem, but there are a number of great tools out there to help developers deal with it.
- David Kleidermacher
With memory being so cheap today, it is becomming more and more common to find embedded systems with many megabytes of RAM (sometimes hundreds of megabytes) to work with, when the application uses only a fraction of available RAM. In these cases, we should consider a solution to the stack size problem that is quite possibly the easiest solution of all: allocate incredibly large stacks. If you think your app might need 10K, but aren't quite certain, why not allocate 30K or 40K? So what if you only use a small portion of it? If memory is cheap and available, why not use it?
- Jim Williams