How I test software, again and again
I have to admit it: I'm a single-stepping junkie. When I'm testing my software, I tend to spend a lot of time in the source-level debugger. I will single-step every single line of code. I will check the result of every single computation. If I have to come back later and test again, I will, many times, do the single-stepping thing all over again.
My pal Jim Adams says I'm silly to do that. If a certain statement did the arithmetic correctly the first time, he argues, it's not going to do it wrong later. He's willing to grant me the license to single step one time through the code, but never thereafter.
I suppose he's right, but I still like to do it anyway. It's sort of like compiling the null program. I know it's going to work (at least, it had better!), but I like to do it anyway, because it puts me in the frame of mind to expect success rather than failure. Not all issues associated with software development are cold, hard, rational facts. Some things revolve around one's mindset. When I single-step into the code, I find myself reassured that the software is still working, the CPU still remembers how to add two numbers, and the compiler hasn't suddenly started generating bad instructions.
Hey, remember: I've worked with embedded systems for a long time. I've seen CPU's that didn't execute the instructions right. I've seen some that did for a time, but then stopped doing it right. I've seen EPROM-based programs whose bits of 1's and 0's seemed to include a few ½'s.
I find that I work more effectively once I've verified that the universe didn't somehow become broken when I wasn't looking. If that takes a little extra time, so be it.
This point is one dear to my heart. In the past, I've worked with folks who like to test their software by giving it random inputs. The idea is, you set the unit under test (UUT) up inside a giant loop and call it a few million times, with input numbers generated randomly (though perhaps limited to an acceptable range). If the software doesn't crash, they argue, it proves that it's correct.
No it doesn't. It only proves that if the software can run without crashing once, it can do it a million times. But did it get the right answer? How can you know? Unless you're going to try to tell me that your test driver also computes the expected output variables, and verifies them, you haven't proven anything. You've only wasted a lot of clock cycles.
Anyhow, why the randomness? Do you imagine that the correctness of your module on a given call depends on what inputs it had on the other 999,999 calls? If you want to test the UUT with a range of inputs, you can do that. But why not just do them in sequence? As in 0.001, 0.002, 0.003, etc.? Does anyone seriously believe that shuffling the order of the inputs is going to make the test more rigorous?
There's another reason it doesn't work. As most of us know, if a given software module is going to fail, it will fail at the boundaries. Some internal value is equal to zero, for example.
But zero is the one value you will never get from a random-number generator (RNG). Most such generators work by doing integer arithmetic that causes an overflow and then capturing the lower-order bits. For example, the power residue method works by multiplying the last integer by some magic number, and taking the lower-order bits of the result. In such a RNG, you will never see an integer value of zero. Because if it ever is equal to zero, it will stay zero for the rest of the run.
Careful design of the RNG can eliminate this problem, but don't count on it. Try it yourself. Run the RNG in your favorite compiler and see how often you get a floating point value of 0.000000000000e000.
End result? The one set of values that are most likely to cause trouble—that is, the values on the boundaries—is the one set you can never have.
If you're determined to test software over a range of inputs, there's a much better way to do that, which I'll show you in a moment.