Five more top causes of nasty embedded software bugs
What do memory leaks, deadlocks, and priority inversions have in common? They're all Hall of Famers in the pantheon of nasty firmware bugs.
Finding and killing latent bugs in embedded software is a difficult business. Heroic efforts and expensive tools are often required to trace backward from an observed crash, hang, or other unplanned run-time behavior to the root cause. In the worst scenario, the root cause damages the code or data in a way that the system still appears to work fine or mostly fine—at least for a while.
In an earlier column ("Five top causes of nasty embedded software bugs," April 2010, p.10, online at www.embedded.com/columns/barrcode/224200699), I covered what I consider to be the top five causes of nasty embedded software bugs. This installment completes the top 10 by presenting five more nasty firmware bugs as well as tips to find, fix, and prevent them.
Bug 6: Memory leak
Eventually, systems that leak even small amounts of memory will run out of free space and subsequently fail in nasty ways. Often legitimate memory areas get overwritten and the failure isn't registered until much later. This happens when, for example, a NULL pointer is returned by a failed call to malloc() and the caller blindly proceeds to overwrite the interrupt vector table or some other valuable code or data starting from physical address 0x00000000.
Memory leaks are mostly a problem in systems that use dynamic memory allocation.1 And memory leaks are memory leaks whether we're talking about an embedded system or a PC program. However, the long-running nature of embedded systems combined with the deadly or spectacular failures that some safety-critical systems may have make this one bug you definitely don't want in your firmware.
Memory leaks are a problem of ownership management. Objects allocated from the heap always have a creator, such as a task that calls malloc() and passes the resulting pointer on to another task via message queue or inserts the new buffer into a meta heap object such as a linked list. But does each allocated object have a designated destroyer? Which other task is responsible and how does it know that every other task is finished with the buffer?
Best practice: There is a simple way to avoid memory leaks and that is to clearly define the ownership pattern or lifetime of each type of heap-allocated object. Figure 1 shows one common ownership pattern involving buffers that are allocated by a producer task (P), sent through a message queue, and later destroyed by a consumer task (C). To the maximum extent possible this and other safe design patterns should be followed in real-time systems that use the heap.2
Click on image to enlarge.