Five more top causes of nasty embedded software bugs
Bug 7: DeadlockA deadlock is a circular dependency between two or more tasks. For example, if Task 1 has already acquired A and is blocked waiting for B while Task 2 has previously acquired B and is blocked waiting for A, neither task will awake. Circular dependencies can occur at several levels in the architecture of a multithreaded system (for example, each task is waiting for an event only the other will send) but here I am concerned with the common problem of resource deadlocks involving mutexes.
Best practice: Two simple programming practices are each able to entirely prevent resource deadlocks in embedded systems. The first technique, which I recommend over the other, is to never attempt or require the simultaneous acquisition of two or more mutexes. Holding one mutex while blocking for another mutex turns out to be a necessary condition for deadlock. Holding one mutex is never, by itself, a cause of deadlock.3
In my view, the practice of acquiring only one mutex at a time is also consistent with an excellent architectural practice of always pushing the acquisition and release of mutexes into the leaf nodes of your code. The leaf nodes are the device drivers and reentrant libraries. This keeps the mutex acquisition and release code out of the task-level algorithmics and helps to minimize the amount of code inside critical sections.4
The second technique is to assign an ordering to all of the mutexes in the system (for example, alphabetical order by mutex handle variable name) and to always acquire multiple mutexes in that same order. This technique will definitely remove all resource deadlocks but comes with an execution-time price. I recommend removing deadlocks this way only when you're dealing with large bodies of legacy code that can't be easily refactored to eliminate the multiple-mutex dependency.
Bug 8: Priority inversion
A wide range of nasty things can go wrong when two or more tasks coordinate their work through, or otherwise share, a singleton resource such as a global data area, heap object, or peripheral's register set. In the first part of this column, I described two of the most common problems in task-sharing scenarios: race conditions and non-reentrant functions. But resource sharing combined with the priority-based preemption found in commercial real-time operating systems can also cause priority inversion, which is equally difficult to reproduce and debug.
The problem of priority inversion stems from the use of an operating system with fixed relative task priorities. In such a system, the programmer must assign each task it's priority. The scheduler inside the RTOS provides a guarantee that the highest-priority task that's ready to run gets the CPU—at all times. To meet this goal, the scheduler may preempt a lower-priority task in mid-execution. But when tasks share resources, events outside the scheduler's control can sometimes prevent the highest-priority ready task from running when it should. When this happens, a critical deadline could be missed, causing the system to fail.
At least three tasks are required for a priority inversion to actually occur: the pair of highest and lowest relative priority must share a resource, say by a mutex, and the third must have a priority between the other two. The scenario is always as shown in Figure 2. First, the low-priority task acquires the shared resource (time t1). After the high priority task preempts low, it next tries but fails to acquire their shared resource (time t2); control of the CPU returns back to low as high blocks. Finally, the medium priority task—which has no interest at all in the resource shared by low and high—preempts low (time t3). At this point the priorities are inverted: medium is allowed to use the CPU for as long as it wants, while high waits for low. There could even be multiple medium priority tasks.

Click on image to enlarge.


Loading comments... Write a comment