Feedback stabilizes all sorts of systems, even engineering teams and projects. That's why postmortems matter.
The TV camera pans across miles of woodland, showing ghastly images of wreckage. Some is identifiablethe remnants of an engine, a child's doll, scattered papers from a businessperson's briefcasemuch is not. The reporter breathlessly pours a mixture of facts and speculation into the microphone. Shocked viewers swear off air travel.
Yet the disaster is also a success of sorts. The National Transportation Safety Board searches for and finds the black boxes that record the flight's final moments and, over the course of several months, even years, reconstructs the accident. We've all seen the stunning computer-generated final moments of a plane's crash on TV. Experts use such models and other tools to find the root cause of the incident. Maybe there was a mechanical flaw in the plane's structure. Perhaps an electrical fire initiated the accident. Once the cause is found, the Federal Aviation Administration issues instructions to the aircraft's builders to implement engineering changes.
Perhaps the pilots were confused by their instrumentation or they handled the wind shear incorrectly. Maybe maintenance people serviced a control surface incorrectly. Changes are made to training or proceduresand they get results. A jet cruises in the sparse air at 40,000 feet, where it's 60 below zero. Four hundred thousand pounds of aluminum traveling at 600 knots relies on a complex web of wiring, electronics, mechanics, and plumbing to keep the passengers safe. It's astonishing a modern plane works at all, yet air travel is the safest form of transportation ever invented. The reason is the feedback loop that turns accidents into learning experiences.
Contrast air travel with the carnage on our roads. Over 40,000 people are killed in the United States each year in car crashes; another two million are injured. Each accident ends with the car crash. Maybe we hear about them or see them on TV, but we learn little and don't change the way we drive. Traffic slows around the emergency crews cutting a twisted body from the smashed car, but we're soon leaning hard on the accelerator again, weaving in and out of traffic, inches from the bumper ahead, even though the most we can hope to shave from our commute is a few seconds.
Carmakers improve the safety of their vehicles by adding crumple zones, air bags, and the like, but the inescapable fact is that the danger sprouts from poor driving. Car and driver constitute a system without feedback, running wildly out of control.
Feedback stabilizes systems. Every electrical engineer knows this. Amplifiers use negative feedback to control their output. An oscillator has positive feedback and so . . . well . . . oscillates.
Feedback stabilizes human systems as well. The IRS's pursuit of tax cheats keeps most 1040s relatively honest. A recent crash on my street led to a week or two of radar enforcement. Speeds dropped to the mandated 30 mph.
Feedback would stabilize embedded systems development efforts, too. Most of the teams I see work madly as a project nears deadline and end up delivering it late and buggy. The boss is angry and customers are screaming. Yet as soon as the damned thing gets out the door, we immediately start developing another project. There's neither feedback nor introspection.
Resumes abound with “experience,” but sometimes that engineer with two-dozen projects and 20 years behind him has repeated the same experience on project after project. The same old heroics and the same bad decisions form the fabric of his career.
Is it any wonder so few systems go out on time?
How do developers learn more about their craft? Buy a pile of books, read some of them, peruse the magazines, go to conferences, bring in outside gurus. These are all great and necessary steps. But it's astonishing that so many fail to seek knowledge from their own actions.
A company may spend anywhere from hundreds of thousands to millions of dollars developing a system. Many things will go right and too many wrong during the work. Wise developers understand that while their engineering group is there to make products, it's also a laboratory where experiments are always in progress. Each success is a eureka moment, and each failure a chance to gain insight into how not to develop.
But there's more to it than gaining personal insight. I prefer to acquire experience scientifically. Firmware development is too expensive to take any other approach. This means all projects should end with a postmortem, a process designed to suck the educational content of a particular development effort dry.
A postmortem is a formal process that starts during the project itself. Collect data artifacts as they are generatedthe estimated schedule, the bug logs, and change requests. Include technical information as well, such as the estimated size (in lines of code and in object file bytes) versus actual, real-time performance results, tool issues, and so on.
After the product is released, schedule the postmortem. Do it immediately upon project completion while memories are still fresh and before the team disbands (especially in matrix organizations). My rule of thumb is to do the postmortem no more than three days after completion.
Management must support the process and make it clear that postmortem work is important. Organizations that view firmware as a necessary evil will try to subvert anything that's not directly linked to writing code. If you find yourself in such a dysfunctional organization, run a stealth postmortem, staying under the screens of the top dogs. If even the team lead fails to buy into this sort of process-improvement endeavor, I guess you're doomed and might as well start looking for a better job.
A facilitator runs the postmortem. In many activities, I advocate rotating all team members through the moderator/leader role, even those soft-spoken individuals afraid to participate in verbal exchanges. It's a great way to teach folks better social and leadership skills. Postmortems, however, tend to fail without a strong leader running the show. Use the team lead for the postmortem, or perhaps a developer well respected by the entire group.
All of the developers need to participate in the postmortem. To maximize the benefits, everyone must learn the resulting lessons. In some cases, it might make sense to bring in nondevelopers who were involved in the project in other ways, such as the angry customer or QA people.
The facilitator's first job is to let everybody know the two ways they can get in trouble. First, slack off and you'll get zinged. Sure, it's the end of the project, probably late, we're all tired, and we all hate each other, but despite this, everyone must put in a few more hours of hard work. Second, obstruct or trash the process and expect to be fired. Saying something like, “Yeah, this is just another stupid process thing that is a waste of time,” is a clear indication someone's not interested in improving. Developers who insist on remaining in stasis aren't particularly useful.
The facilitator also ensures the postmortem isn't used to beat up on a particular developer who might have been a real problem on the project. The fundamental rule of management must apply: praise publicly, discipline privately. Deal with problem people off line.
Hold a history day meeting. Run by the facilitator, a history day is when we look at the problems encountered during the project. The data that was acquired during the effort is a good source of quantitative insight into the issues.
Remember, a postmortem is not a complaint session. The facilitator must be strong enough to quash criticisms and grumbling. And, perhaps counterintuitively, it's not a place to solve every problem. Instead, you should identify problems that appear solvable and pick those that promise the maximum return on investment.
Resist the temptation to solve all of the ills suffered during the project. I'm a product of the '60s. At the time, we thought we could save the world. We couldn't. But it was possible to implement small changes, to make some things better. Don't expect any one postmortem to lead you to firmware nirvana. Postmortems are baby steps we take to move to a higher plane. Try to do too much and the effort will collapse.
So, pick two to four problems, depending on the size of the group. Then break the team into subgroups, each one tasked to crack a single issue.
The groups must create solutions that consist of distinct actions. If the project suffered from never ending streams of changes from the 23-year-old marketing droid, a solution like “stop accepting changes” or “institute a change control process” is useless. These are nice sentiments but will never bear fruit.
Instead, create plans specifying particular actions. “Joe evaluates change control tools by April 1. Selects one. Trains entire team on the process by April 15. No uncontrolled changes accepted after that date.”
If each group comes back and presents their solutions to the entire team, the postmortem process will absolutely fail. We engineers have huge egos. Each of us knows we can solve any problem better than almost anyone else. If team A comes in and tells me how I can do part of my job better, I'll immediately toss out a dozen alternate approaches. The meeting will descend into chaos and nothing useful will result.
Instead, before making any presentations, team A solicits input on their ideas from each developer. This is low-key, the sort of thing you can do around the watercooler. The team is looking for ideas and buy in. Use the U.S. Congress as a model. Nothing happens on the floor. All negotiations take place in back rooms, so when the final vote occurs on the floor it's all but a fait accompli.
A final meeting is held, at which time the solutions are presented and recorded. In writing.
End with a post-project party. What? You don't do those? The party is an essential part of maintaining a healthy engineering group. All ones and zeroes make Joe a dull boy. The party eases tensions created by the intense work environment. But it happens only after the project is completely finished, including the postmortem.
With the postmortem done, the team disbands. Now the most important part of the process begins: the employment of feedback to improve future projects. When the next development effort starts, the leader and all team members shouldmustread through all of the prior postmortems. This is the chance to avoid mistakes and to learn from the past. A report that's filed away in a dusty cabinet never to surface is a waste of time.
A 1999 study by Gloria Congdon (“Techniques and Recommendations for Implementing Valuable Postmortems in Software Development Projects,” master's thesis University of Minnesota, May 1999) showed that of 56 postmortems the developers found 89% of them very worthwhile. The other 11%, those that failed, are the most interesting. Developers rated them bad to awful because there was no follow-through. The postmortem took place but the results were ignored.
Enlightened management or those companies lucky enough to have a healthy process group will use the accumulated postmortems outside of project planning to synthesize risk templates. If a pattern like “Every time we pick a new CPU we have massive tool problems,” emerges, then it's reasonable to suggest never changing CPUs or taking some other action to mitigate this problem.
Plane crashes, though tragic, are feedback that can prevent future accidents and save lives. Shouldn't we employ a similar feedback mechanism to save future projects and learn new ways to be more effective developers?
Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at .