Comments

December 05, 2005

JackGanssle-December 05, 2005

I was chatting informally with a group of developers last week when one used a phrase guaranteed to set me off.

“I write self-commenting code,” he revealed proudly.

Uh huh. I have yet to see a program of any useful size that, stripped of comments, is self-documenting. C is not a language like English or Swedish where there’s so much information conveyed that even in a noisy room, where one might catch only 70% of the words, the meaning still comes across.

Computer languages are inherently dense and precise: miss a single character and the program won’t run correctly. Mix up “Identifier” and “identifier" and, if you’re lucky, the compiler will complain. Programmers with less good fortune will get a clean compile and spend days or weeks looking for a hard-to-find bug.

Usually these folks will go on at length about their use of long variable names. Hey, I’m all in favor of making variables as long as they need to be to clearly express an idea. But length, in this case, isn’t always an asset. I find it awfully hard to read something like:

for(next_output_buffer_sequence!=first_output_buffer_sequence; !complete_message_assembled_by_host_process; ++final_result_queue_pointer);

The C constructs (operators et al) are lost in the morass of names. And reading a single statement split across many lines confuses the eyes.

Most compilers only recognize the first 31 characters as being unique so it’s dangerous to get enamored with exceedingly long names.

Some developers subscribe to the “comment nearly every line” school of thought. Their code looks like:

for (i=0; i
++Array_Pointer; // point to next element in Array
*Array_Pointer=0; // Set Array element to zero
} // end for

Much better is to eliminate those annoying and not particular informative comments and prefix the entire snippet with:

// Array is a sparse matrix; empty
// elements are denoted by zero so
// here we initialize all elements to
// “empty” (a zero).

The second style conveys the sense of the code while the first gives plenty of detail and no context.

Others write the code first and add the comments later. They don’t want to “waste time” with documentation while endlessly fiddling with a routine to get it to work. But the comments should be the function’s design. If the developer doesn’t know enough about the design before cranking code, just how do he start pounding C into the editor? Is it a random walk? “Uh, hmm, I dunno, let’s try:”

void main(void){

“Oh boy, now what? How about maybe initializing something… or should we set up a queue?”

The idea must be that if they type enough C a function, a structure and a clear idea will emerge. That might indeed happen… eventually. But not efficiently.

There’s a spec somewhere - perhaps only in the developer’s head - which describes in English what a function should do in a human-friendly manner. The code is a translation of that spec to cryptic and unforgiving computerese. So I figure the way to write a function is to create all of the comments first. The header, and even all of the individual little snippets of English spread throughout the code. Then the function’s design is done.

After that, anyone can fill in the code.

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at jack@ganssle.com. His website is www.ganssle.com.


No such animal... self commenting code. But on a similar note, I'm for auto syntax correcting compilers. Suppose I write:

fr(idx=0; idx<10; idx++)="">

{ etc(idx); }

or

whle(idx<25)>

{ etc(idx); }

The compiler should be smart enough to see that "fr" is really and "whle" is really . It would be interesting to see a compile with an auto syntax correcting feature. What do you think?

- Steve King


ok...

'self-commenting-code' is pretty much an oxymoron. Comments, in general, are supposed to elucidate what the code is doing.

In general, I first look at the header file...if I find nothing of use there, then I cringe and start plowing through the 1000's of lines of 'C'.

In my opinion, it is better to have:

1. Clearly defined specifications....(notice, I did not say complete!)

2. Decent design document, (Word with some of those Visio pictures are great).

3. Some header files that show some of the entity relationships, and have some comments on the rules and constraints.

- Ken Wada


Mr. King's idea of a "correcting compiler" is interesting. However I am not sure I want the compiler changing my source files. Perhaps such a compiler needs 2 modes, a "correcting" mode for use in development and a "production" mode which does not change the source code.

He can have almost the same thing with modern code editors. By coloring keywords, the typos of "for" and "while" would become quickly apparent, since they would appear as variable or function colors instead of keyword colors. An even greater advantage is coloring the comments different than the code. If you accidentally put some code in a multi-line comment it would be readily apparent.

I think Mr. Wada's 3 points are excellent.

- Bob Bailey


The compiler should be smart enough to see that "fr" is really "for" and "whle" is really "while".

I see that your form does not like the greater than or less than symbol... and anything between them is deleted!

- Steve King


To make sure that the code is self-commenting. The computer has to understand the components and the result.

Take one popular example.

"Portable"

"Computer" ->

Portable Computer

Did the computer recognize this?

Don’t think so.

PS

Documentation and tools are often driven by word processors, compilers and portability, rather then by the need for good program documentation.

- Martin


We were taught, many years ago, to provide concise comments in code only in areas where explanation was necessary. Why?

If you have had to maintain code that has changed significantly, but its verbose and unneccesary comments were left unchanged, you need not ask why!

- Martin Allen


Donald Knuth (yes that Knuth) developed something he called literate programming. The idea is to merge

writing programs & documentation.

I has a cult like following. CWEB is

a tool for C, and there tools for languages like ML. The thing is

most of us are not great writers -- we

"hack out code" and as a result the concept never seemed to catch on

- Alwyn E. Goodloe


The closest that I have seen something that was "self-documenting", and useable was doxygen. If the comments are written with the proper syntax that the tool understands, it works very nicely. One additional feature that it has is that it can genearte user documentation along with diagrams. A very useful tool, which has a very good price as well -free. :)

- Arvind V


There is nothing like moving to a new company and inheriting a substantial "self-documented" program to make one a believer in the proper use of documentation. Self-documented code is particularly cruel when your predecessor left you with a revision that does not work, things that worked were changed, and self-documentation does not tell you why. Comparing files will highlight the changes, but are the changes part of a new feature, a bug fix, or did he just find a cleaner way to re-write working code? Self-documented code will never answer these vital questions. You can trace pointers, functions, variables, etc., but you can only guess at why it was done the way that it was done and not some other way.

- Steve Wise


Steve, I agree that the compiler ought to be smart enough to get "for" from "fr" given the context, but a developer who is slamming out code so sloppily that he depends on the compiler to mop up that kind of error is probably making far less benign mistakes than misspelling language keywords. He will be made aware of some of those mistakes at compile and link time, some during debugging, and some by customers who find his bugs for him. He deserves a slap on the wrist for poor workmanship and lack of attention to detail.

I agree with Jack that many programmers make the mistake of overcommenting, especially because it obscures structure, the discernment of which is vitally important to a reader understanding the code. Comment briefly, and comment why. Don't waste my time with comments as to what or how.

Excessive comments are also the most prone to rot when the code changes beneath them. As Norm Schryer said, "If the code and the comments disagree, then both are probably wrong."

I want to emphasize something Jack implies, but doesn't come right out and say: comments are documentation, and as such should be good English -- complete sentences, not broken speech. Even between /* and */, tenets of good technical writing apply, and correct grammar always leads to greater understanding.

- Daniel Daly


No such thing as self-commenting code? I beg to differ! This is from the IOCCC, 2001 (cheong.c, probably need to view this in a fixed font):

#include

int l;int main(int o,char **O,

int I){char c,*D=O[1];if(o>0){

for(l=0;D[l ];D[l

++]-=10){D [l++]-=120;D[l]-=

110;while (!main(0,O,l))D[l]

+= 20; putchar((D[l]+1032)

/20 ) ;}putchar(10);}else{

c=o+ (D[I]+82)%10-(I>l/2)*

(D[I-l+I]+72)/10-9;D[I]+=I<0?0>

:!(o=main(c/10,O,I-1))*((c+999

)%10-(D[I]+92)%10);}return o;}

- Dan McCarty


As much as self documenting code may be a pipe dream (I certainly haven't been able to write without comments and I've tried), writing code lucidly is something that doesn't happen near often enough. I mean what about

- well written or placed comments

- descriptive variables

- choosing a while();, for();, or do {} while; that makes the code more comprendable.

Style counts. Maybe we just ought not call it 'self documenting'

- Pat Thomson \


"So I figure the way to write a function is to create all of the comments first. The header, and even all of the individual little snippets of English spread throughout the code. Then the function’s design is done.

After that, anyone can fill in the code."

Thank You, Thank You, Thank You!

I've been preaching this approach to my students (Mechanical Engineers learning to build mechatronic systems) for years. Now I can point them to a non-academic source that agrees. Maybe, I'll have heard the last of 'It's all done, I just need to go back and comment it'

naw, that's wishful thinking, but this will be usefull none the less.

- Ed Carryer


Hi Jack

Interesting timing from my POV...

I am working on a software project, which will also end up with a book describing how the code works. My list of things to do [in order] is:

- design the code [mainly data structures]

- write book [mostly]

- write/test the code

- edit/finish book

I assume that this approach would receive the "Ganssle Seal of Approval"?

- Colin Walls


I've seen code that's largely self-documenting. Consider the CPP (C preprocessor, not C++) module in the LCC project. It is very sparsely commented, yet easy to understand and maintain despite being rather complex (it even includes a hand-coded lexical analyzer and parser). On the other hand, one of the originators of Unix told me he wrote this code, so I guess it shouldn't come as a great surprise that he could do a real bang-up job at it.

Of course, whether code is self-documenting depends on the purpose and experience of the reader. If I intend to be noodling around inside the code, fixing bugs and adding new features, I'd rather the code be written so as not to need comments. On the other hand, if I want to simply use a library, I don't want code comments, just documentation. If you have a system that can extract documentation from comments, such as doxygen or MATLAB does, that's great (as long as you keep them up to date). But these really aren't code comments at that point.

- Gerald Williams


My contributions to your article:

#define elementsof(x) (sizeof(x) / sizeof(x[0]))

To reference the number of elements in an array without having to spell out the number; for (i=0; i < elementsof(my_structure_array);="" i++="" {="" do_something()="" }="">

and cross-referencing the header files:

#include // CHAR_BITS, ...
#include // printf(), getc(), ...

So that the casual observer knows what comes from where.

- M. David Gelbman


Your code needs to communicate two things to the reader:

1) What the code is doing
2) Why it's being done this way

Self-documenting code is the best way to accomplish #1, but it doesn't address #2. Your product spec addresses #2, but typically from a higher view than the code level. That's why, in addition to writing code that makes it easy to figure what's going on, you also need to comments to explain why your doing what your doing, what you already tried but didn't work, why it makes sense to break a certain coding rule, what gotchas future programmers should look out for, etc.

- Tony Gray


"In my opinion, it is better to have:

1. Clearly defined specifications....(notice, I did not say complete!)

2. Decent design document, (Word with some of those Visio pictures are great)."

In my experience, this approach usually ends in the specs and design documents becoming separated from the code. Some poor bastard gets handed the code without the documentation to make a change--or you receive 500 pages of docs, and after reading them you realize that the docs apparently described rev 1.0, but you are working on rev 3.6.

If you put the documentation in the program as comments, it's always there for the next programmer. And, while it isn't guaranteed that each programmer updated the comments as he made the changes to the program, at least it's practical to do what you should!

I've had even worse experiences. I once had the job of designing a new device to communicate with an old military computer system over a proprietary serial bus. I received detailed (written to mil spec, of course) documentation on the serial bus, including the schematics of the old interface card, but no source code. It "wasn't needed", as I wasn't changing the program at that end--was probably a strange form of assembly language anyhow. Working my way through the docs, I became a bit puzzled because a critical 54LS74 flipflop had to be triggered by the falling edge of a clock pulse, and I darn well know that the '74 triggers on the rising edge. It took a week with a very good logic analyzer to finally capture the totally undocumented series of pulses, emitted several seconds before a message came through, that set the preceding circuits up to where that flipflop would receive a rising edge when needed.

There's a small block of code somewhere in that source code I didn't have that generates those pulses, and it probably starts with comments explaining just what it does. But I didn't receive that because "everything I needed to know" was in the specification documents, except that most likely the programmer probably wasn't allowed to update that to explain how he worked around the hardware designer's mistake.

- Mark Moss


Loading comments...