Some persistent ideas - Embedded.com

Some persistent ideas

It won't come as a surprise to any practitioner of the software art that we tend to be an ornery and opinionated sort. That's been true throughout the history of the discipline and was especially true in the early days. We early software folk tended to be “rugged individualists” with very strong, albeit wildly differing, notions about how software should be written. There were few if any established methodologies or ground rules at the time. Even Kernighan's and Plauger's seminal book, Elements of Programming Style, was still in the future.

Today, academic disciplines and studies, textbooks, formal methodologies, company training courses, programming guidelines, and peer attitudes may have dampened the wildest excursions of our individualism but not eliminated them.

Like all creative people, software folk often have ideas of their own as to how things should be done. But ideas come in all flavors: some are good, some are brilliant, and some are crushingly, embarrassingly bad. In the end, the trick is not to have only good ideas (that's not possible) and definitely not to blindly follow the ideas of some vaunted authority. The trick, rather, is to be able to discern the good ideas from the bad ones, reject the bad ones, and adopt the good ones as your own.

For reasons I don't fully understand and therefore can't explain, I've often found my own ideas to be out of step with those of my colleagues. These differences led to debates, ranging from polite discussion to out-and-out food fights. In time, I came to accept the battles as part of the profession and contented myself with the observation that “my side” often carried the day.

But I never anticipated that I'd be having to fight the same tedious battles, generation after generation, ad infinitum. Just when I think I've gotten out of the debates, they pull me back in. Just when I think a given issue has been settled for all time, along comes a new generation of programmers, sporting the same tired old idea. Some of the worst ideas seem to enjoy eternal life, enjoying rebirth and returning to plague me, like Count Dracula rising from his coffin.

Today, I'd like to tell you about two of the more persistent and pernicious of the Bad Ideas. “It's Too Inefficient”
Most programmers are perfectly willing to agree, in principle, that notions like modularity, encapsulation, information hiding, and software reuse are good ideas, leading to more reliable and maintainable software. Yet all too often, these ideas are honored mostly in the breach. Old Fortran programmers like me well remember the days when most Fortran programs were written in bowl-of-spaghetti (BOS) fashion, with little if any modularity. Although even the earliest compilers, and assemblers before them, supported callable subroutines with passed parameters, many programmers chose not to use them.

I got this one right. From the get-go, I was writing small subroutines with passed parameters. David Parnas had nothing on me. Don't get me wrong: I claim no 20/20 prescience here. I used the modular style for two good reasons: First, it's the way I was taught. The fellow who taught me Fortran didn't show me any other way. He had me writing small subroutines and functions for him to use in a large simulation, and he showed me how to write in the style that he wanted. Black boxes he wanted, black boxes he got, and my programming style was set forever.

When I looked at software created by others, I was pretty dismayed to find that the BOS model predominated. Their Fortran programs would go on for page after page, with nary a single CALL or RETURN, and therefore no parameter lists (for that matter, no comments, either). Just large numbers of GOTO's.

We were doing scientific programming at the time, simulating space trajectories, and therefore making heavy use of vector and matrix math. I used my subroutines,which weren't all that different from the ones you've seen in my C++ vector/math package.

Most of my colleagues used the same algorithms, but not the idea of modularity. Instead, they coded them in line. Where I might write:

	Call Cross(a, b, c)   

They'd write:

	c(1) = a(2)*b(3) - a(3)*b(2);	c(2) = a(3)*b(1) - a(1)*b(3);	c(3) = a(1)*b(2) - a(2)*b(3);   

At each place where they needed another cross product, they'd code the same three structures, only with different variables names. If they needed multiple cross products, they'd code the same three lines again, carefully duplicating the index patterns.

Which brings me to the second reason I liked modularity: It kept me from making stupid errors. Each time you copy that three-line algorithm for the cross product, you risk getting either one of the indices, or one of the argument names, wrong. The error can often be hard to track down (did you spot the error in the lines above?). Using a black-box subroutine, I could be pretty certain that if it worked right on the first cross product, it would also work right for the next one.

More than once, I'd mention to a colleague, “You know, you could just call a subroutine for that. I have one you can use, if you like.”

The answer was always the same:

“No, I couldn't do that. It would be too inefficient.”

Edsger Dijkstra pointed out that code efficiency depends far more on using the right algorithm than on tricky code. But if it's speed you want, and you don't care if the answer is right, I can give you speed out the gazoo. For me, at least, correctness and ease of programming trumps raw performance any day.

Once, a programmer was assigned to adapt one of my simulations to a new computer.After working on it for awhile, he came to me almost livid in anger. My coding, he asserted, was far too inefficient. I had subroutine calls nested four or five layers deep, and he thought this was unconscionable. “Every time you call a subroutine,” he warned, “you waste 180 microseconds.”

I looked at my watch for a time, then said, “I can wait.”

He was not amused.Ironically enough, he later wrote an attitude simulation of his own. He used the “efficient,” BOS model. But he made a mistake, inverting a matrix in the highest-speed loop, even though it was a constant. For this and other reasons, my “inefficient” program was more than twice as fast. Now, one could argue that the two issues aren't connected — that the math mistake didn't have anything to with his coding style. I tend to suspect, though, that if he'd stuck to a modular style, his coding and testing tasks have been easier, perhaps easy enough so he'd have spotted the mistake.

Over time, people learned that modularity was good, spaghetti bad. And as computer mainframes got blazing speed, execution time was not such an issue. But the efficiency notion came back to life in spades, with the advent of microprocessors. With their (initially) slow clock speeds and software floating point, efficiency was again an issue. Even (or especially) when using assembly language, programmers tended to the BOS model. So did (and do) many programmers of embedded systems.

The notion persists to this day. Only a year ago a colleague, whose intelligence I value, showed me a Matlab program he had written. It used many long modules and had many of the “in-line cross product” sorts of constructs. I ran a McCabe metric on it, and got a cyclomatic number around 27 (something like 3-5 is better). I gently asked my friend why he hadn't used a more modular style. He replied,

“I thought about it, but that would have been too inefficient.”

Now, you have to take the time to get your arms around the situation. Matlab is an interpreted language. It's not going to use the computer efficiently, no matter what you do. We don't write programs in Matlab when we want speed; we use it when we want the convenience, ease-of-use, and GUI environment of the interpreted language. Granted, Matlab programs can be compiled into faster executables for production, but my friend wasn't using that feature. So efficiency should have been the last thing on his list of needs.

“You never know…”
Many years ago, NASA distributed a space trajectory simulation called the N-body Program. It quickly became the standard for lunar and interplanetary studies and enjoyed a well-earned reputation for excellence among its users.

Around 1963, NASA assigned a programmer at Goddard Space Flight Center to rewrite the program using the more structured constructs of Fortran IV. I was among the first to see a copy of the “new, improved” program.

I was stunned. Oh, the program had modules, alright. But the calling lists were completely absent. When he set out to rewrite the program, the new guy had what was, to him, a Good Idea. Instead of the nicely structured arrangement of the original program, he'd take every single variable in the program and give it global scope.

I said, “Um, it appears that you've moved all the variables to COMMON.”

“Yes,” he acknowledged proudly.

“Why would you do that?” I asked.

He explained,

“You never know when you might want to print something out.”

I thought that surely this resoundingly Bad Idea was an anomaly, not one I'd be likely to see again. But I was wrong. It's surprising — also depressing — how often the idea re-emerges. Only a few years ago, I ran across another Fortran simulation, a much-used program written by a highly experienced Ph.D. This guy not only had the same idea, but compounded it with another, even more resoundingly bad one.Dr. Seuss wrote the book, Too Many Daves. It describes “Mrs. McCave, who had 23 sons, and named them all Dave.” It goes on to say:

“And often she wishes that, when they were born, She had named one of them Bodkin Van Horn, And one of them Hoos-Foos. And one of them Snimm. And one of them Hot-Shot. And one Sunny Jim. And one of them Shadrack. And one of them Blinkey. And one of them Stuffy. And one of them Stinkey. Another one Putt-Putt. Another one Moon Face. Another one Marvin O'Gravel Balloon Face. And one of them Ziggy. And one Soggy Muff. One Buffalo Bill. And one Biffalo Buff. And one of them Sneepy. And one Weepy Weed. And one Paris Garters. And one Harris Tweed. And one of them Sir Michael Carmichael Zutt, And one of them Oliver Boliver Butt And one of them Zanzibar Buck-Buck McFate…

But she didn't do it. And now it's too late.”

My colleague did something shocking similar, only it wasn't nearly as funny. He must have thought, “Here's a great idea: Let's move all 60,000 of the program's variables into global COMMON, and name them all x .”

This is not a joke. I'm serious. He did exactly that. More precisely, he defined:

	Common x(60000)   

Then he equated elements of x to the real variables, so that:

x(1) was really BodkinVanHorn , x(2) Hoos_Foos , etc.

To make all of this work, he developed a preprocessor that would allow him to write his code in the usual way, giving the variables names with mnemonic significance. As it read the source file, the preprocessor assigned the user's variables to a unique element of x . Then it wrote a new source file, with every reference to every variable replaced by the equivalent element of x . The preprocessor also wrote a symbol table file, which my pal used later. He built a DOS batch file that moved the appropriate files around.

Why would he jump through all these hoops, just for the chance to name all of his variables dave? I wondered too, so I asked him. He replied:

“You never know when you might want to print something out.”

Sigh.

I've thought about this program many times since. Haven't we seen other development tools that take names with mnemonic significance and assign array addresses to them? And don't they also build symbol tables? Of course we have, and they do. They're called assemblers and compilers.

Think about it. IBM's original assembler wasn't called Symbolic Assembler Program (SAP) for nothing. A large part of its job was to replace both instruction mnemoics and user variables with numeric values — opcodes for the mnemonics, RAM addresses for the data variables. Compilers do something similar.

In effect, this guy's preprocessor duplicated the work of the compiler, building its own symbol table and assigning each variable to a slot in the x array. Then his batch file would pass the new, “intermediate” source file to the Fortran compiler, which would do exactly the same thing, assigning every element of x to a location in RAM.

Such a deal.

How could anyone have ever thought this was a Good Idea? It's because it satisfied his need to “print out” any variable in the program. He had written an interactive tool that ran in concert with the simulation program. Using the symbol table his preprocessor had built, he could select a handful (10 or 12, as I recall) of the 60,000 elements and display them in little text boxes. Think the Display block of a Simulink program, and you won't be too far from the idea. If he chose, he could pause the program, single step, and switch the selected variables for new ones.For a production program, I can't see much value in sitting at my desk, watching the selected outputs spin by like odometer readings, usually too fast to read. I suppose my pal could look for special events, such as the altitude going through zero, but other than that he'd only be able to see, in “real time,” the coursest of trends.

Looking back, I think he used the display tool mostly for debugging. In effect, the interactive display tool served as the symbolic debugger that the compiler vendor hadn't provided. Having all the program variables in a single, global array meant that my pal didn't have to depend on the compiler to give him the RAM location of each variable.

But I'm still not convinced that this approach gives much bang for the buck. When I'm debugging, I need to see the exact numeric values of the variables of interest, so I compare them with the values predicted by hand calculations. I can get those values with the primitive mechanisms called debug prints.

When I'm debugging simulations these days, I like to use graphs. A picture is worth a thousand words, and although I can't measure exact values from a graph, I can at least see that things are going in the right direction, and I can see it much better than if I watch display dials spinning by.

But here again, I don't need a special tool to get the graphs. All I need to do is write the selected variables to a file, then use an offline graphing tool to draw the pictures. Matlab can do this, as can Mathcad or Excel.

In the end, this guy's desire to be able to “print out” any variable he chose, and watch it evolve before his eyes as the program ran, completed dominated the design of his program. To achieve this end, he was willing to completely destroy the structural integrity of his program, all the while creating tons of gratuitous complexity, generating two major applications of dubious value, and violating the KISS principle all to heck.

He was very, very proud of his Great Idea and his accomplishments in getting it all to work. I, in contrast, though it was a monumentally Bad Idea, squared. Needless to say, disagreements ensued.

Of all the mistakes one can make, holding onto a bad idea just because it's your bad idea has got to be the worst idea of all.

Jack Crenshaw is a systems engineer and the author of Math Toolkit for Real-Time Programming. He holds a PhD in physics from Auburn University. E-mail him at jcrens@earthlink.net. For more information about Jack click here

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.