How I test software -

How I test software


In my last column, I talked about the way I write software. I began by saying, “I've always known that the way I write software is different from virtually everyone else on the planet.”

Boy, was I wrong. Many of you readers sent me e-mails saying, in effect, “Me too.”

In retrospect, I was arrogant to suppose that I'm “unique on the planet,” except in the sense that each of us is. It's nice to know I have some kindred spirits. (Not everyone shared my views. See the very thoughtful letter from Jim Berry in this month's Parity Bit section.) Yet, if I have so many kindred spirits, how come I never noticed?

Here's what I think. Faced with a certain development environment, complete with management directives and guidelines, and impossible schedules, different people respond differently. Some folks salute smartly, say “Yes, sir,” and make do. I, on the other hand–the model of soft-spoken tact–give my opinions, which range from “That's unacceptable,” to the ever popular, “That's the dumbest idea I ever heard!” Such comments have justly earned me, over the years, the title Jack “not-a-team-player” Crenshaw.

Having blurted out my opinion on a given environment, I try to change it to one where I can be productive.

Sometimes I succeed, sometimes not. One in-house project with the potential for making the company millions of dollars, involved real-time software for a microprocessor. When the “software development environment” arrived, it turned out to be a single-board evaluation kit, complete with line-by-line assembler (no labels, just addresses), and an old 110-baud, thermal-paper terminal. When I said, “Hey, wait a minute! Where's my hard drive? Where's my bulk storage device?” the project manager calmly pointed to the terminal's 110-baud cassette drives.

Fortunately, my campaign to change that environment succeeded: we got a far more acceptable one and delivered a great product.

More often, I'm unable to change the development environment. On such occasions, I try to carve out a mini-environment where I can still be effective. In my last column, I mentioned an embedded system project where we were promised new builds every two weeks, whether we needed them or not. In my inimitable fashion, I blurted “That's unacceptable! I'm used to turnarounds of two seconds” and set out to find a better solution. We found one: an interactive development environment (IDE) on the company's time-share mainframe. The IDE came complete with assembler, debugger, and instruction-level CPU emulator.

It wasn't perfect. The terminal was yet another 110-baud, thermal printing beauty that we used over the company's voice lines to the time-share system 1,500 miles away. But we got edit-compile-test cycles down to the 10-minute level, a whole lot better than two weeks.

Were there kindred spirits on that project, doing things the same way I do? Sadly, not. But in looking back, I suspect I had many kindred spirits on other projects and just didn't know it. It wasn't that they were all corporate sheep shuffling to the slaughter. It's just that they were a lot more successful than I, in seeming to be. Like me, they (you) would carve out mini-environments and practices where they could be more effective. They just did it under the radar. They didn't blab about it and left out the “That's unacceptable” and “dumbest idea” parts.

Dang! I wish I'd thought of that.

Waterfall vs. agile
There's a tension between the classical, waterfall-diagram approach to design (separate phases for requirements, design, code, and test) and the iterative, spiral-development approaches. I'm 100% convinced that the waterfall model doesn't work. Never did. It was a fiction that we presented to the customer, all the while doing something else behind the scenes.

What's more, I can tell you why it doesn't work: we're not smart enough.

The waterfall approach is based on the idea that we can know, at the outset, what the software (or the system) is supposed to do, and how. We write that stuff down in a series of specs, hold requirements reviews to refine those specs, and don't start coding until those early phases are complete and all the specs are signed off. No doubt some academics can point to cases where the waterfall approach actually did work. But I can point to as many cases where it couldn't possibly have, for the simple reason that, at the outset of the project, no one was all that sure what the problem even was, much less the solution.

One of my favorite examples was a program I wrote way back in the Apollo days. It began as a simulation program, generating trajectories to the Moon. Put in an initial position/velocity state, propagate it forward in time, and see where it goes.

As soon as we got that part done, we realized it wasn't enough. We knew where we wanted the trajectory to go–to the Moon. But the simulated trajectory didn't go there, and we had no idea how to change the initial conditions (ICs) so it would.

We needed an intelligent scheme to generate the proper ICs. As it happens, the geometry of a lunar trajectory involves a lot of spherical trig, and I discovered that I could codify that trig in a preprocessor that would help us generate reasonable ICs or, at the very least, tell us which ICs would not work. I wrote a preprocessor to do that, which I imaginatively named the initial conditions program .

Using this program, we were generating much better trajectories. Better, but not perfect. The problem is that, after all, a real spacecraft isn't so accommodating as to follow nice, simple spheres, planes, and circles. Once we'd hit in the general vicinity of the desired target, we still had to tweak the IC program to get closer.

A problem like this is called a two-point boundary value (TPBV) problem. It can only be solved by an iterative approach called differential correction. We needed a differential correction program. So we wrote one and wrapped it around the simulation program, which wrapped itself around the initial conditions program.

Just as we thought we were done, along came the propulsion folks, who didn't just want a trajectory that hit the target. They wanted one that minimized fuel requirements. So we wrapped an iterative optimizer around the whole shebang.

Then the mission planning guys needed to control the lighting conditions at the arrival point and the location of the splashdown point back at Earth. The radiation guys needed a trajectory that didn't go through the worst of the Van Allen belts. The thermal guys had constraints on solar heating of the spacecraft, and so forth. So we embedded more software into the iterators and solvers–software that would handle constraints.

In the end, I had a computer program that needed almost no inputs from me. I only needed to tell it what month we wanted to launch. The nested collection of optimizers and solvers did the rest. It was a very cool solution, especially for 1962.

Here's the point. In those days, computer science was in its infancy. In no way could we have anticipated, from the outset, what we ended up with. We couldn't possibly have specified that final program, from the get-go. Our understanding of the problem was evolving right alongside the solutions.

And that's why we need iterative, spiral, or agile approaches.

Environment drives approach
As I mentioned last time, I tend to test my software very often. It's rare for me to write more than five or six lines of code without testing, and I often test after coding a single line.

You can't do this, though, if your system takes four hours to compile or two weeks to build. For my method to work, I need to keep the edit-compile-test cycle time very short, measured in seconds rather than hours, days, or weeks. If my development environment can't support this, I go looking for one that can.

It hasn't always been that way, especially for embedded systems. In the beginning, we made do with paper tape inputs read by an ASR-33 terminal. Our only way to “download” software was by burning an EPROM.

Fortunately, those days are gone forever. The development environments available these days are amazingly good and shockingly fast. You can develop in an environment (I prefer an IDE) that's almost indistinguishable from your favorite PC compiler system. You can download it into a target at Ethernet, if not USB, data rates, and debug it in situ, using a JTAG-based source debugger. Wonderful.

Avoid the embedded part
When I'm developing software for an embedded system, I follow one other rule. I can sum it up this way: Avoid the embedded system like the plague.

Understand, here I'm talking about the real, production or prototype hardware, not a single-board evaluation kit. At first glance, this may seem crazy. What's the point of developing software for an embedded system if you're not using the embedded system?

Let me explain. A lot of the software I develop, whether for an embedded system or not, is math intensive. And the math isn't going to change (we hope) whether it's in the embedded system or my desktop. A Kalman filter or a least-squares fit is the same, regardless of the processor it runs on.

Yes, I know, that isn't strictly true; finite word lengths and processor speeds can affect the performance of an algorithm, as do the timing and handshaking between multiple tasks. Even so, the right place to test a new algorithm is not in the embedded system at all; it's in the most effective system we can find, which these days usually means the multicore, multigigahertz box on your desktop.

There are at least two other reasons for shying away from the embedded system. First, it may not even be there. In many projects, the hardware is being developed in parallel with the software. If you don't even start the software development until the hardware is complete, you're behind schedule on day 1.

Second, even when the hardware has arrived, it may not be stable. Many of us can tell stories about getting strange results and not knowing whether the problem is in hardware or software. We can also tell stories about how the hardware guys invariably say, “Must be a software problem. My hardware is working fine.” Before we can get the hardware fixed, we first have to convince the hardware guys that it's broken. Sometimes that's not easy.

Because the hardware isn't always stable, the hardware guys need access to the embedded system, just as you do. There's nothing more disheartening than to be all geared up for an intensive debugging session, only to find that the hardware isn't even there; it's gone back to the shop for rework. You end up negotiating with the hardware guys to get access to the machine.

For all these reasons, I do my level best to hold off hardware-software integration to the last possible moment. Here's my typical sequence:

1. Develop the math algorithms in a desktop machine. I often use languages other than C or C++ (more on this later).

2. Code the algorithms in C++ using the desktop machine.

3. Download and test the software in a CPU simulator or emulator. If an emulator, I'll take either a software emulation, or hardware in-circuit emulator (ICE). Alternatively, I might use an instruction-level simulator, running on the desktop machine.

4. Download the software into an evaluation kit. Simulate the sensors and actuators of the real embedded system. If you're using an ICE, you can use its internal CPU for this phase.

5. After–and only after–you've tested the software in the simulated/emulated environments, go to the real hardware.

Is this wise? What happens if the hardware doesn't support the algorithm? Well, I suppose that could happen but only if I completely misunderstood the nature and function of the hardware. Fortunately, that's never happened to me.

But wait, you say. I thought you were advocating doing device drivers first. I was, and am. That's where the outside-in, as opposed to top-down or bottom-up, development comes in. I wasn't lying about avoiding the hardware as long as possible. I also wasn't lying about wanting to be sure the I/O devices are working and that I have good, robust device drivers to interface with them.

I need both. I need to know the hardware–particularly the sensors and actuators–works. I'll be testing them using standalone test software, the moment the devices are available. But I don't test the hardware by running my entire application on top of it. Instead, I write simple little test programs to verify that (a) the hardware works and that (b) I know how to talk to it. Then I can get back to developing my other software, confident that I can marry it to the real hardware when the time comes.

Code and unit test
Ok, let's assume I've found an environment that allows frequent testing, with an edit-compile-test cycle measured in seconds. I've got my software architecture designed, I've identified a unit (subroutine, function, module, or whatever you choose to call it), to start with, and I'm ready to begin coding. When do I start testing?

If you've gotten the point of this series at all, you already know the answer: I start now.

In the classical, waterfall-diagram approach to software, there's a phase called “code and unit test.” In the original concept, you're supposed to build a given unit until it's complete. Then you test it and move on to the next unit.

That's not the way we do it anymore. I've told you that I tend to test as I go, testing each time I've added a handful of lines of code. What I haven't said is, test it how? Answer: Using a test driver. As far as I'm concerned, there is no other option.

I worked with one group developing a huge application, so big and slow it took multiple processors, each having multiple cores, to run it. And yet, the only way programmers had to test their functions was to load it into the big Mother of All Programs, run that big mother, and set a breakpoint where their function got called the first time.

That's insane. End of discussion.

What's that you say? You can't test each unit individually, because they all depend on each other and pass data items around among them?

You're actually admitting that you write spaghetti code? Sounds to me like time to do some major refactoring. I submit that if you can't separate one module from its neighbors, you also can't tell if any of them are actually working.

So I'm going to write a test driver that I develop along with the unit under test (UUT). I'll do this for every unit, and every collection of units, right up through the final, main program. Each test driver is going to invoke the UUT, pass some data to it, and test it. What should the first version look like?

Since my initial function has no executable lines of code in it, the test is going to be pretty doggone simple. It's a stub of both the UUT and its test driver–which is why I'm perfectly comfortable with starting with the null program void main(void){} .

The test driver calls no function at all. If you feel uncomfortable about this, you can always write the universal test-driver/UUT pair:

void foo(){}	// function prototypevoid main(void)	//main program{	foo( );}   

My next step, of course, is to flesh out the UUT, sending data to it from the test driver, and checking the data it returns. The next obvious question is, what data do I use, and how do I know if the UUT got the right result?

The test case(s)
Answer: I must generate test case(s). Good ones.

I can't tell you how many times I've seen otherwise good programmers try to slide on this point. Generating a good test case is not always easy. Some folks seem to just pull some input values out of the air, look at the results, and say, “Yep, that looks about right.”

Give me a break! You can't do that! What does “about right” even mean, anyhow? In the worst case, it simply means your module compiled and ran without crashing, and gave a number that wasn't ridiculous. But come on, you'll have to do better than that.

My pal Jim Adams, one of the best software engineers I know, and a specialist in testing, puts it this way: When testing, never ask a computer a question unless you already know the answer.

The test case should be reasonable, in the sense that the inputs should be in the same range as the values expected in the real world. More to the point, the test case should include every computation, not just those that lead to output values.

In the old days, we used to call this a “hand check.” I still do, because this is exactly how I use the test. As I'm testing, I check the value generated by every single instruction, and compare it to my hand check. They have to agree.

Now, when I say “agree,” I don't just mean “looks reasonable.” I mean agree, as in, the numbers should match exactly, within the limits of floating point arithmetic. Anything greater than this, and we need to go track down the source of the discrepancy. Do not pass “go” until the questions are resolved.

In olden times, I was sometimes pretty cavalier in the lower-level testing. I wouldn't write things down, I'd just think up some input data on the fly, then I'd check the output. If I were writing, say, a vector add routine, I might add the two vectors [1,2,3] and [4,5,6]. Not very imaginative, but if the answer was [5,7,9], I'd declare success. It's hard to imagine a way that any function could get that answer, without adding the numbers and getting their sums right.

For the record, you needn't worry about making your input numbers seem more “realistic.” You can trust your math processor to get the floating point arithmetic right. If it can add 2.0 and 3.0, we have to believe it can add 1.41421356 and 1.61803399.

The one exception occurs when your algorithm is limited to a range of values. Sometimes algorithms fail at edge conditions. The square root function, for example, can't accept negative numbers. The arcsine function can't accept inputs outside the range –1..+1. The reciprocal operation 1/x fails when x = 0 . When testing, it's absolutely essential that you test these edge conditions, with values at the edge, and also just inside and just outside it. Especially in embedded software, bouncing out on an unhandled exception is not an option.

Sometimes people tell me that they can't really test such edges in their program, because the edges are down inside the UUT, and they can't compute what values of the inputs correspond to an internal edge condition.

In that case, someone has some refactoring to do. We still have to test the edges–all of them.

When I was testing, I at least should have written down the test case inputs I used. I didn't. Now I do. As Jim Adams has pointed out, if I don't document the test case, either in my databook or a test report, I'm doomed to re-invent it. And since by that time, I'm likely to have forgotten the subtleties of the algorithm, I'm more likely to get the test driver wrong.

On the other hand, I don't want to make the test drivera my life's work. For the lowest-level UUTs–as low as a square root function–I can't afford to waste the time writing a test plan, a test report, and everything that goes in between.

Between Jim and I, we've worked out an approach that documents the test cases and their results, without causing undue paperwork. We'll write a test driver that passes test case data (which includes edge-condition values) to the UUT. It evaluates the results, and reports any failures. It does not report successes. It's the very opposite of “verbose.” Under normal conditions, you don't want to see line after line of output that goes:

Testing case 1Case 1 passedTesting case 2Case 2 passedetc.   

In this case, no news is good news. A null output means that the UUT passed its tests. This approach also allows one level of test driver to call others, with no verbosity in between. How does the test driver report failure? I usually take the coward's way out and use the C/C++ mechanism, assert( ) . You can choose a more sophisticated mechanism. You might choose to have the test driver report each failure, then move on to the next test. This approach may seem more sophisticated, but I'm not convinced it's better. A failure is a failure is a failure, and the first one calls for it to be fixed before continuing.

In the end, the test drivers go into the project library. They get archived and kept under version control right along with the UUTs. You can collect all the test drivers into a parent driver, which becomes a permanent regression test driver.

Generating the test cases
We still haven't discussed the nature of the test cases, and where the test case data comes from. We've only established that it should exist, and be comprehensive. This is where the discussion starts to really get interesting. Hold that thought for next time.

Jack Crenshaw is a systems engineer and the author of Math Toolkit for Real-Time Programming. He holds a PhD in physics from Auburn University. E-mail him at For more information about Jack click here

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.