Visualizing, tracking down and dealing with tainted software
A classic movie scene I go back and watch repeatedly is the You Tube "Serpentine, serpentine" segment from the 1979 movie "The In-Laws," starring Peter Falk (as a CIA agent) and Alan Arkin (as his soon-to-be brother-in-law).
They are in an open field being shot at by hidden snipers as they run to a nearby car to make their escape. Coaching Arkin on how to avoid getting shot, Falk yells at Arkin: "Serpentine, serpentine!" as they race to safety. Arkin’s attempts to comply with this instruction as he runs is a comedic high point of the scene.
For me, “Serpentine, serpentine” is the perfect metaphor for the common situations developers face dealing with multiple sources of bad or hacked code and the convoluted methods they must use to avoid those snipers’ bullets. Falk's advice is even more relevant now as embedded software designs become more complex, heterogeneous and connected, requiring attention to every piece of code used and determining whether or not it is tainted, no matter what its source.
While the majority of embedded designs are still of the single processor/controller variety, a typical board design may have two to eight or more MCUs multitasking between their control operations and communicating with other MCUs. And more sophisticated multicore designs used in consumer embedded and mobile applications are shifting from largely homogenous core designs with four to six processors to heterogeneous designs with a dozen or more cores of varying types.
And now, with the Internet, virtually every embedded design has some sort of connection to outside wired and wireless networks. It's a situation that will get only worse as we move towards the ubiquitous connectivity of every device envisioned by the "Internet of Things."
The big question in all of these situations is: how does an embedded developer ensure the quality and reliability of the code or data coming into his system from another processor on his board, from another core in his multicore design, or from a networked sensor, or Internet server or router?
One way to do this is by using taint analysis of incoming data or code from an external source. As noted in this week's Tech Focus Newsletter, the technique is being ever more widely used, taking advantage of the already existing code-tracking capabilities of both open source and proprietary static and dynamic code analyzers.
If you are not familiar with the use of code analyzers in monitoring the many sources of incoming code and data that may be "tracking mud" into your perfectly clean code base, be sure to read "Tracking down the tainted data in your embedded app with static analysis," by Paul Anderson of GrammaTech.
"Taint analysis is a technique that helps programmers understand how risky data can flow from one part of the program to another," he writes. "An advanced static analysis tool can perform a taint analysis and present the results to the user, making the task of understanding a program’s attack surface easier, and easing the work involved in finding and fixing serious defects." Other examples of how this technique can be put to work include:
Most new static analysis tools have some degree of tracking capability in the form of node and edge diagrams to visualize software that's useful in dealing with misbehaving code. As noted by Mark Pitchford in "Tracing requirements through to object-code verification," a good static analyzer can help you track the behavior and location of a programmer's source code after it has been converted by a compiler into object code form for use on a target processor.
Taking taint analysis and tracking to the next level is GrammaTech's CodeSonar static analyzer, which makes available a number of graphical views including a top-down visualization of taint flow in a program. To understand the power of this technique be sure to read "Geospatial visualization helps manage million-line-plus embedded code bases," by Michael McDougall, John Von Seggern, Paul Anderson, and David Cok. As noted in this week's Tech Focus newsletter they are not alone in using this powerful, technique. Among my Editor's Top Picks because of their relevance to current problems are:
But as noted by Alex Taradov in "How to debug elusive software code problems without a debugger," sometimes problems occur that are beyond the capabilities of your existing tools. His problem was debugging wireless sensor networks. "One of the major problems with debugging networks of devices is that behavior of the individual devices depends on the behavior of the surrounding nodes and the amount of traffic being exchanged," he writes. "This makes it impossible to debug such systems on a low scale."
His solution forced him to go back and look at the capabilities of the underlying C language and develop a software technique that captures the call stack in real time and uses the stack dump from the embedded system at the point of failure to get the information he needed.
Though C is denigrated as a low-level language compared to C++, Java, and Ada, Embedded.com contributors such as Jack Ganssle, Colin Walls, and Dan Saks provide ongoing tips and tricks that can be adapted to almost any situation. How do you address such situations? Or is this something you have not addressed so far? Are current tools more than adequate? Are there aspects of the C language that come in useful in such situations?
Embedded.com Site Editor Bernard Cole is also editor of the twice-a-week Embedded.com newsletters as well as a partner in the TechRite Associates editorial services consultancy. He welcomes your feedback. Send an email to firstname.lastname@example.org, or call 928-525-9087.