A step-by-step guide to using static analysis to debug embedded software
It was about 30 years ago, at my first real job in embedded systems. I was head of a team developing software for a first-generation medical intensive care monitor. (The device was roughly a cubic meter in size, not the smaller-than-a-toaster medical monitors we see today.)
One day I got called into the company President’s office. After he shut the door behind me, he asked, “Hey David, please tell me why you can’t get your software right the first time.” He appeared to be getting more and more angry as he continued, “You report that you finished writing the software; and then you go on for months and months after that working on getting so-called ‘bugs’ out of the software before the software’s ‘ready’ to try out in a real medical environment.” “Why can’t you guys just write the software correctly the first time, and save yourselves a lot of frustration, and save me a lot of money??” Back then in the 1970’s, I couldn’t answer those questions. I thought I was about to be fired, and that my embedded systems career would be over right then and there.
Now let’s fast-forward to the present. Thirty years have gone by, and the embedded community has learned a lot about the economics and management of software development . But still today more than half of all software development time continues to be spent finding and repairing defects in code.
Recently a new glimmer of hope has appeared on the software development horizon, in the form of a new generation of static source code analysis tools. These are tools that analyze software code bases for bugs and other defects without actually running the programs that are built from the software. They’re based on algorithms that go over source code with the finest of fine-toothed combs searching for software faults. Faults can be found early in the development process, so as to improve software quality as well as save developers time. Such tools can cover the execution paths through a code base in a fully automated way, and identify complex defects including bugs involving interactions among multiple procedures.
Code Coverage Criteria
A basic issue in efforts to identify and rid software of defects is the question of code coverage, which asks what portion of a code base is to be examined, and at what level of depth is it to be examined. The spectrum of possible code coverage options includes line coverage and path coverage.
Line coverage measures the fraction or percentage of the total number of lines of code that are to be examined for defects. Many embedded projects strive for 100% line coverage, although some fall short of this goal since technical difficulties may be encountered. For example it is often quite difficult, if not impossible, to create black-box test cases that will exercise every last line of code in a large code base. So very often only partial line coverage is achieved.
But on the other hand, even when 100% line coverage is achieved, it is far from certain that all defects in the code have been identified. Here’s an example:
In this example, a single test case that calls npd_switcher(1); is sufficient to cause every line of code in the function to execute. So 100% line coverage is achieved. But, there’s still a very serious bug in this code: When input variable cond has value 0, the last assignment statement in the function will attempt to write to address #0. Such a “null pointer de-reference” is never a good thing. So we’ve seen a very basic bug go undiscovered despite 100% line coverage.
A stronger level of coverage is path coverage. It measures the fraction or percentage of the total number of paths through each function in an application that are to be examined for defects. Imagine constructing a flow chart of an application where each decision point (if, while, switch, etc.) is a branch in the flow chart. The above function npd_switcher(); has two paths: The first path takes the ‘TRUE’ branch at if(cond), and is the same path that also achieved 100% path coverage for this code. The second path takes the ‘FALSE’ branch at if(cond). It is a real path – even though there is no code for the ‘ELSE’ condition. In fact, this second path is the one containing the previously undiscovered null pointer de-reference.
Well, if path coverage is so good, then why doesn’t everybody just do coverage of all paths? Answer: it is really hard to do! Plus the amount of work involved would be incredible. The number of paths in a software function can grow exponentially with the number of conditions and decisions in the code. And loops can create very large numbers of paths.
Hence, practical path coverage limits the number of possible paths to be considered. For example, the DO-178B standard  for aviation software at Level “A”, its most stringent level where failure of software could have catastrophic consequences to an aircraft in flight, requires a level of coverage called “MC/DC” – or “Modified Condition / Decision Coverage”. It essentially says that at every branch point in the software, every condition and every decision must be made to take on every possible value at least once.
Dynamic Vs. Static Analysis Tools
The two main classes of code quality tools are dynamic analysis tools and static analysis tools.
Dynamic analysis tools have been popular for many years. They observe executable code at run-time, and report violations performed by the code as it runs. These tools are excellent at detecting bugs such as dynamic memory corruption and resource leaks.
Dynamic analysis tools are characterized by a low false-positive rate. In other words, when such a tool reports a bug – it’s a real bug. These tools rarely get confused, because they’re watching your real code execute, and they can see all the information about your software’s state and its inputs. For the same reasons, dynamic analysis tools provide results that are relevant. They report real bugs that actually occurred when the real software was really running. Not the kinds of bugs that could only occur if 500 monkeys were typing into 500 keyboards for 500 years.
But dynamic analysis tools also have down sides: Defects are found late in the software development process, since these tools require an operating executable version of your code. The more subtle bugs, the ones involving integrated software units, are only found after software integration is complete. And the dynamic analysis itself only covers those test cases that were actually run. So if test cases were run for only 95% line coverage under the watchful eye of a dynamic analysis tool, then that tool would likely be failing to identify many, many other defects.
In the world of embedded software, dynamic analysis tools have great difficulty identifying randomly appearing bugs such as those caused by the interleaving of tasks in a preemptive multi-tasking environment when a software developer has neglected to properly protect a critical section.
Static source code analysis tools are tools that analyze software code bases for defects without actually running the programs that are built from the software. They can cover the execution paths through a code base in a fully automated way.
Static analysis tools can identify bugs and defects early in the software development process, because they do not rely upon the availability of an operating executable version of the code being examined. In an extreme case, static analysis can find bugs in code even before it has undergone a clean compilation. Human-generated test cases are not needed, since the analysis algorithms themselves identify the relevant paths through the code to be examined, as well as appropriate application data values for the analysis. The static analysis can be relatively fast, taking perhaps several hours for a multi-million-line code base – or moments for a quick analysis of an individual programmer’s code at his/her individual workstation.
In the world of embedded software, static analysis tools can identify randomly appearing bugs such as those caused by the interleaving of tasks in a preemptive multi- tasking environment when a software developer has neglected to properly protect a critical section. Bugs such as those can be identified consistently and repeatably by the defect searching algorithms of a static analysis tool – even though the bug will not appear consistently or repeatably at application software run-time.
So we see that the two classes of code quality tools, the dynamic analysis tools and the static analysis tools, are not competitors at all. Static analysis is strong in areas that pose a challenge to dynamic analysis, and vice-versa. Hence static analysis and dynamic analysis of code quality are complementary technologies.
Variety And Complexity Of Defects
Static source code analysis tools are capable of ferreting out a vast variety of bugs and defects, in the realms of software quality, concurrency, and software security. Here is a ‘sampler’ of bug types that can be identified by static source code analysis tools that are commercially available today:
- Locking errors
- Null pointer de-reference
- Use after free
- Double free
- Array indexing errors
- Mismatched array new/delete
- Stack overrun
- Heap overrun
- Return pointers to local variables
- Logically inconsistent code
- Insecure use of user data
- Un-initialized variables
- Invalid use of negative values
- Passing large parameters by value
- Under-allocations of dynamic data
- Memory leaks
- File handle leaks
- Network resource leaks
- Unused values
- Un-handled return codes
- Use of invalid iterators
- File-based race conditions
The bugs and defects are presented to software developers in a GUI style similar to that encountered when working with source level debuggers.