Getting disciplined about embedded software development: Part 3 - The value of postmortems
The TV camera pans across miles of woodland, showing ghastly images of wreckage. Some is identifiable: the remnants of an engine, a child's doll, scattered papers from a businessperson's briefcase; much is not. The reporter, on a mission to turn tragedy into a career, breathlessly pours facts and speculation into the microphone. Shocked viewers swear off air travel till time diminishes their sense of horror.
Yet the disaster, a calamity of ineffable proportions to those left waiting for loved ones who never come home, is in fact a success of sorts. The NTSB searches for and finds the black boxes that record the flight's final moments, and over the course of months or years reconstructs the accident.
We've all seen the stunning computer-generated final moments of a plane's crash on the Discovery Channel. Experts find the root cause of the incident and then change something. Maybe there's a mechanical flaw in the plane's structure, perhaps an electrical fire initiated the accident. The FAA issues instructions to the aircraft's builders and users to implement an engineering change.
Perhaps the pilots were confused by their instrumentation, or they handled the wind shear incorrectly. Maybe maintenance people serviced a control surface incorrectly. Or perhaps it was found that Americans are getting fat so old loading guidelines no longer apply (as was recently the case in one incident). Changes are made to training or procedures. This sort of accident never happens again.
A jet cruises in the sparse air at 40,000 feet where it's 60 below zero. Four hundred thousand pounds of aluminum traveling at 600 knots relies on a complex web of wiring, electronics, mechanics, and plumbing to keep the passengers safe. It's astonishing a modern plane works at all, yet air travel is the safest form of transportation ever invented. The reason is the feedback loop that turns accidents into learning experiences.
Contrast the airplane accident with the carnage on our roads—over 40,000 people are killed in the United States of America each year in car crashes; another 2 million are injured. The accident ends with the car crash (plus enduring litigation); we learn nothing from either, we take no important lessons away, we make no changes in the way we drive.
Traffic slows around the emergency crews cutting a twisted body from the smashed car, but then we're soon standing hard on the accelerator again, weaving in and out of traffic inches from the bumper ahead, in a manic search to save time that may shave, at best, a few seconds from the commute.
Carmakers do improve the safety of their vehicles by adding crumple zones and air bags, but the essential fact is that the danger sprouts from poor driving. The car and driver represent a system without feedback, running wildly out of control.
Feedback stabilizes systems. Every EE knows this. Amplifiers all use negative feedback to control their output. An oscillator has positive feedback, and so, well, oscillates.
Feedback stabilizes human systems as well. The IRS's pursuit of tax cheats keeps most 1040s relatively honest. A recent awful crash on my street led to a week or two of radar enforcement. Speeds dropped to the mandated 30 mph, but the police soon moved on to other neighborhoods.
Feedback does—or should—stabilize embedded development efforts. Most of the teams I see work madly on a project, delivering late and buggy. The boss is angry and customers are screaming. Yet as soon as the thing gets out the door we immediately start developing another project. There's neither feedback nor introspection.
Resumes abound with "experience;" often that engineer with two-dozen projects and 20 years behind him actually has had the same experience time after time. The same old heroics and the same bad decisions form the fabric of his career. Is it any wonder so few systems go out on time?
The role of engineering managers
In most organizations the engineering managers are held accountable for getting the products out in the scheduled time, at a budgeted cost, with a minimal number of bugs. These are noble, important goals.
How often, though, are the managers encouraged—no, required —to improve the process of designing products?
The Total Quality movement in many companies seems to have bypassed engineering altogether. Every other department is held to the cold light of scrutiny, and the processes tuned to minimize wasted effort. Engineering has a mystique of dealing with unpredictable technologies and workers immune to normal management controls. Why can't R & D be improved just like production and accounting?
Now, new technologies are a constant in this business. These technologies bring risks, risks that are tough to identify, let alone quantify. We'll always be victims of unpredictable problems.
Worse, software is very difficult to estimate. Few of us have the luxury to completely and clearly specify a project before starting. Even fewer don't suffer from creeping featurism as the project crawls toward completion.
Unfortunately, most engineering departments use these problems as excuses for continually missing goals and deadlines. The mantra "engineering is an art, not a science" weaves a spell that the process of development doesn't lend itself to improvement.
Engineering management is about removing obstacles to success. Mentoring the developers. Acquiring needed resources.
It's also about closing feedback loops. Finding and removing dysfunctional patterns of operation. Discovering new, better ways to get the work done. Doing things the same old way is a prescription for getting the same old results.