A Message from Space
A key factor in the development of the Lunar module for Apollo 9 was “test, test, and more test,” according to the HBO miniseries “From the Earth to the Moon” (Part 5, “Spider,” 1998). That's not a bad guideline in an engineering effort that demanded so many things to be accomplished that had never been done before. Given that precedent, what went wrong with the Mars polar lander, which disappeared last December and presumably crashed into the red planet? Various stories have circulated, some pointing to gross management incompetence, but whatever scenario you believe, the one commonality among the stories is that the system was not properly tested before launch.
“There was inadequate software design and testing. The software should have been designed to prevent premature engine shutdown,” said former Lockheed Martin executive Tom Young who presented the NASA report on the polar lander's fate. “In space, one strike and you're out.” (See www.cnn.com/2000/TECH/space/03/28/lander.report.02/index.html)
Test not only allows you to identify and fix faults, but more importantly, it gives you the data you need to feed back into and improve the process so as to prevent future failures. In that respect, the space program is atypical since the products are more or less one-offs, which makes it more difficult to fix the process. Test, especially systems integration test, necessarily takes place at the end of the development cycle, and when intermediate deadlines slip, the pressure mounts to shrink the test phase to get the product to market on time. If your company has a cash-flow problem, the temptation to shortchange test can be strong. In non-mission critical applications, that can result in annoyances for customers, who sometimes find themselves involuntarily recruited into doing final test in the course of using their purchased software, which just goes to justify the admonition against buying version 1.0 of any software package. In safety-critical applications, shipping inadequately tested products is not an option.
Hardware system quality is at an all-time high, but that's not where the current challenge is. The increase in the software content of systems coupled with shorter design cycles means that producing bulletproof software is more difficult than ever. Since safety-critical designs can't rely on the customer doing final test, the development process had better assure software as well as hardware quality. Thorough testing is key to gaining that assurance.
Test is also more complex than you think it will be, as Michael Barr's article this month on memory test, um, attests. For example, you can develop a test that verifies the functionality of a memory chip but still cannot detect if the chip is missing from the board.