What's the cheapest way to get rid of bugs? Don't put them in in the first place.
That seemingly trite statement is the idea behind the entire quality revolution that reinvented manufacturing during the 1970s. Design quality in rather than try to fix a lot of problems during production.
Most people under 40 have no memory of the quality problems U.S. automotive vendors inflicted on their customers for many years. I remember my folks buying cars in the 1960s. With five kids on a single-income engineer's salary, my dad's primary decision parameters (mom was never consulted on such a purchase) were size (a big station wagon) and price. Choices were mostly limited to the Big Three. With the exception of the even-then ubiquitous VW Beetle, foreign manufacturers had made few inroads into this market. But Detroit's offerings were always plagued with problems, from small nuisance issues to major drivetrain troubles. Consumers had no recourse since all of the vendors offered the same poor quality. Perhaps foreshadowing today's low expectations about commercial software, car buyers 40 years ago accepted the fact that vehicles were full of problems, and many trips to the dealer to get these cleared up was simply part of the process of acquiring a new car.
About the same time, Japanese products had a well-deserved bad reputation. “Made in Japan” and “junk” were synonymous. But Japanese managers became a student of quality guru W. Edwards Deming, who showed how a single-minded focus on cost at the expense of quality was suicidal. They eventually shifted production to a low-waste system with an unyielding focus on designing quality in. The result: better autos at a lower cost. Detroit couldn't compete. (Of course, many other factors contributed to the U.S. firms' 1970s' woes. But cash-strapped American buyers found the lure of lower-cost high-quality foreign cars irresistible.)
U.S. vendors scrambled to compete using, at first, marketing rather than substance. “Quality is Job One” became Ford's tagline in 1975. Buyers continued to flock to less self-aggrandizing manufacturers who spoke softly but carried few defects. But by the very early 1980s, Ford was spewing red ink at an unprecedented rate. A division quality manager hired Deming to bring the Japanese miracle to Detroit. Eventually the quality movement percolated throughout the automotive industry, and today it might be hard to find much of a difference in fit and finish between any manufacturer. Tellingly, Ford abandoned “Quality is Job One” as a mantra in 1998. The products demonstrated their success and marketing slight of hand was no longer needed to dodge an inconvenient truth.
Lean manufacturing perhaps got its name from a 1989 book (Lean Thinking by James Womack and Daniel Jones) but its roots trace back to at least Ben Franklin and later to Henry Ford.1 Waste means there's a problem in the process, whether the waste is from rework (errors) or practices that lead to full garbage pails. Wastage is a sure indicator that something is wrong with any process. And it's an equally vital red flag that a software development group is doing something wrong.
For some reason, the lean revolution by and large hasn't made it into software engineering. Bugs plague our efforts, and are as expected as any other work product. Most projects get bogged down in a desperate debugging phase that can consume half the schedule. I guess that means we can call the design and coding line items the “bugging” phase.
When in 1796 Edward Jenner rubbed cowpox on eight-year-old James Phipps' arms, he wasn't fixing a fever; the boy was perfectly healthy. Rather, Jenner knew some 60% of the population was likely to get smallpox and so was looking for a way to prevent the disease before it occurred. That idea was revolutionary in a time when illness was poorly understood and often attributed to vapors or other imaginary effects of devils or magic.
The pre-Jenner approach parallels software engineering with striking similarity. The infection is there; with enough heroics it might be possible to save the patient, but the toll on both the victim and doctor leaves both weakened. Jenner taught us to anticipate and eliminate sickness. Lean manufacturing and the quality movement showed that defects indicate a problem with the process rather than the product. Clearly, if we can minimize waste the system will be delivered faster and with higher quality.
In other words, cut bugging to shorten debugging. The best tool we have to reduce bugging is the code inspection.
Over the last 10 years, I've mentioned code inspections in passing in this column some 33 times, yet haven't written anything substantive about them since August, 1998.2 Many of our readers were still in high school back then!
The statistics are dramatic. Most code starts life with around five to 10 bugs per hundred lines of code. Without inspections figure on finding about 95% of those preshipping. For a 100KLOC program that means shipping with hundreds of defects! (Note that some sources, such as Stan Rifkin, “The Business Case for Software Process Improvement,” Fifth SEPG National Meeting, claim far worse numbers, on the order of six shipped bugs per hundred lines of code–six thousand in a 100KLOC project.3 My unscientific observations, coupled with data from private conversations with Capers Jones, suggest that firmware generally doesn't sink to that abysmal level.)
It's important to understand that testing, as implemented by most companies, just does not work. Many studies confirm that tests exercise only about half of the code. Exception handlers are notoriously problematic. Also consider nested conditionals: IFs nested five or six levels deep can yield hundreds of possible conditions. How many test regimes check all of those possibilities?
In the March issue, I wrote about cyclomatic complexity which, among other things, tells us the minimum number of tests needed to completely exercise a function. In researching that article I found that one function in the Apache web server scored a complexity index of 725. Who constructs nearly a thousand tests for a single function?
Another analysis of the testing problem is scarier. In an article printed in Crosstalk , Watts Humphrey shows that a program with 100 decisions can have nearly 200,000 possible paths.4 With 400 decisions that explodes to 1011 paths, each requiring a test! Although he describes a pathological worst-case situation, the implications are important.
Some approaches to test are much more effective. Code coverage, used in the avionics and some other industries, can approach 100% testing, albeit with huge costs.
Test Driven Development advocates also claim good results, although reliable supporting studies are hard to find. However, anecdotal evidence does suggest that most of the agile approaches, which have a disciplined focus on testing, greatly improve on the 50% code test ratio. The agile approaches demand, correctly, automated tests that cost little to run. But in the embedded space that's hard. Someone has to watch the display and press the buttons. I've seen some very cool ways to automate those activities, and some agilists successfully build mock objects to simulate the user interaction. So-called virtual prototypes, too, are gaining market share (see the products from Virtutech or VaST, for instance).
The bottom line is that test is a huge problem. However, a well-run inspection program will eliminate 70 to 80% of the bugs before any testing starts.
When I was a young engineer, my boss demanded that all schematics go through a formal design review. It was expensive to respin boards, and even costlier to troubleshoot and repair design mistakes. At first I found this onerous: a colleague was going to critique my work? How mortifying when someone found errors in my circuits! But a little time spent reviewing saved many hours downstream. Even better was that I was able to tap into the brains of some really smart engineers. One was a whiz at manufacturability issues, and he brought this insight to each review. What did college teach about that subject? Nothing, but Ken's probing dissections of my designs was a crash course in the art of creating products that were buildable and maintainable. Another engineer always eerily spotted those ghostly race conditions that are so tough to find. The old saw that many brains are better than one was proven many times in these reviews, and the same idea applies to inspecting code.
Plenty of evidence suggest well-run inspections are some 20 times cheaper than debugging. That's a staggering number. Since the average project burns about half the schedule in the debugging phase, any technique with a 20X improvement nearly halves the schedule. I'm generally skeptical of any dramatic claim, from the Ginzu knives to politicians' promises to clean up Washington, but even if you cut that number by an order of magnitude, inspections still shave a third from the schedule.
Higher quality code, for less money. What a deal!
Inspections are a quality gate, a filter for bugs. They must take place after one gets a clean compile (in other words, let the tools find the easiest bugs) and before any testing starts. Yes, most developers prefer to inspect code they've beaten into submission with the debugger for a while, but given the 20X efficiency factor, doing any testing before inspecting is like burning stacks of $100 bills. It's simply bad business. Even worse, early testing leads to the demise of the inspections. Few of us are strong enough to resist the siren call of “But it works!” We're all busy, claims of “works,” which may mean little depending on the efficacy of the tests, generally result in the inspection meeting being dismissed.
One complaint about inspections is that they may find few bugs. That, of course, is exactly our goal. If I write software no one will see, and then write the same function given the knowledge that my peers will be digging deep into its bowels, be sure the code will be very different. No one wants his errors to be pointed out. The inspections are therefore sort of an audit that immediately raises the program's quality.
My goal in this month's allotted 1,800 words is to paint a picture about how business learned to design quality into a product rather than shoehorn it in late in the process, and then to draw an analogy to software development. Next month I'll dive into how one goes about doing an inspection.
But remember this: after examining 4,000 software projects, Capers Jones found the number one cause of schedule problems was quality.5 Bugs. Inspections don't slow the project. Bugs do.
Jack Ganssle () is a lecturer and consultant specializing in embedded systems' development issues. For more information about Jack .
1. Womack, James and Daniel Jones. Lean Thinking. New York: N.Y.:Simon & Schuster, 1996.
2. Ganssle, Jack. “Faster, Better Code,” Breakpoints, Embedded Systems Programming , (http://embedded.com/98/9808br.htm).
3. Rifkin, Stan. “The Business Case for Software Process Improvement,” Fifth SEPG National Meeting, 26–29 April 1993.
4. Humphrey, Watts. “The Software Quality Challenge,” Crosstalk , June 2008, available at: www.stsc.hill.af.mil/crosstalk/2008/06/0806Humphrey.html
5. Jones, Capers. Assessment and Control of Software Risks . Englewood Cliffs, N.J.: Yourdon Press, 1994.