Perfecting naming conventions - Embedded.com

Perfecting naming conventions

There are some 7,000 languages used today on this planet, suggesting a veritable Babel of poor communication.

'Tis but thy name that is my enemy;
Thou art thyself, though not a Montague.
What's Montague? It is nor hand, nor foot,
Nor arm, nor face, nor any other part
Belonging to a man. O, be some other name!
What's in a name? That which we call a rose
By any other name would smell as sweet;
So Romeo would, were he not Romeo call'd,
Retain that dear perfection which he owes
Without that title. Romeo, doff thy name,
And for that name which is no part of thee
Take all myself.

–Romeo and Juliet, act II, scene II, William Shakespeare

Maybe to Juliet names were fungible, but names and words matter. Biblical scholars refute attacks on scripture by exhaustive analysis of the meaning of a single Greek or Aramaic word, whose nuance may have changed in the intervening millennia, corrupting a particular translation.

In zoology, the binomial nomenclature, originally invented by Carl Linnaeus (born 300 years ago this year), rigorously specifies how species are named. Genus names are always capitalized while the species name never is. That's the standardized way zoologists communicate. Break the standard and you're no longer speaking the language of science.

Names are so important, there's an entire science, called onomatology, devoted to their use and classification.

In the computer business, names require a level of precision that's unprecedented in the annals of human history. For instance, Motor_start() and motor_start() are as different as the word for “hair” in Urdu and Esperanto. Mix up “l” and “1” or “0” and “O” and you might as well be babbling in Babylonian. Yet, depending on the compiler implementation, this_is _a_really_long_variable_name and this_is_a_really_long _variable_name_complete_nonsense are identical.

Yet we still use “i,” “ii,” and “iii” (my personal favorite) for index variables. You have to admire anyone devoted to his family, but that's no excuse for the too-common practice of using a spouse's or kid's name in a variable declaration.

Words matter, as do names. Don't call me “Dave.” I won't respond. Don't call a variable foobar . It conveys nothing to a future maintainer. Great code requires a disciplined approach to naming.

Conventions
There are some 7,000 languages used today on this planet, suggesting a veritable Babel of poor communication. But only about 9 are spoken by more than 100 million people; 69 are known by 10 million or more (See the “Other estimate” column, http://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers). The top ranks are disputed, but speakers of Mandarin, Spanish, English, and perhaps Hindi far outnumber those for any other language. C itself is composed entirely of English words like “if,” “while,” and “for,” though in many companies programmers comment in their native language. This mix can't lead to clarity, which is the overarching goal of naming and documenting.

The global economy means, for better or worse, many companies that do all their work in-house will be outsourcing and offshoring parts of the effort in the future. Sites such as Rentacoder.com, which encourages programmers from a vast number of countries to compete for work, are a harbinger of the future. A common lingo is needed to ease communication among so many cultures, and the default is clearly English. That may change, just as French was once the lingua franca of diplomacy and German that for science. So at the risk of sounding Anglocentric, I think it's clear that before too long most of us will be required to develop code and comments in English. Names, therefore, should use English words.

Spelling matters. Misspelled words are a sign of sloppy work. Our crummy tools, though, don't do any sort of spell checking, even though programmers have given the rest of the world fabulous tools that immediately flag a misspelled word. Invariably a spelling error will creep in from time to time. When discovered, fix it. It's nearly impossible to maintain code littered with these sorts of mistakes, as now the developer has to remember the oxymoronic “correct misspelling” to use.

Long names are a great way to convey meaning, but C99 requires that only the first 31 and 63 identifiers to be significant for external and internal names, respectively. Restrict all names to 31 characters or less.

Don't redefine a name using C's scoping rules. Though legal, having two names with different meanings is confusing. Similarly, don't use names that differ only in case.

On the subject of case, it's pretty traditional to define macros and constants in uppercase while using a mix of cases for functions and variable names. That seems reasonable to me. But what about camel case? Or should I write that CamelCase? Or is it camelCase? Everyone has a different opinion. But camel case is merely an awkward way to simulate a space between words, which gets even more cryptic when using acronyms: UARTRead . Some advocate only capitalizing the first letter of an acronym, but that word-izes the acronym, torturing the language even more.

We really, really want to use a space, but the language recognizes the space character as an end-of-token identifier. The closest typographical character to space is underscore, so why not use that? This_is_a_word is, in my opinion, easier to grok while furiously scanning hundreds of pages of code than ThisIsAWord . Underscore's one downside is that it eats away at the 31-character name-size limit, but that rarely causes a problem.

Types
Developers have argued passionately both for and against Hungarian notation since it was first invented in the '70s by space tourist Charles Simonyi. At first blush the idea is appealing: prefix variables with a couple of letters indicating the type, increasing the name's information density. Smitten by the idea years ago, I drank the Hungarian Kool-Aid.

In practice, Hungarian makes the code ugly. Clean names get mangled. szString means “String ” is zero-terminated. uiData flags an unsigned int . Then I found that when changing the code (after all, everything changes all the time) sometimes an int had to morph to a long , which meant editing every invocation of the name. One team I know avoids this problem by typedefing a name like iName to long , which means not only is the code ugly, but the Hungarian nomenclature lies to the unwary.

C types are problematic. Is an int 16 bits? 32? Don't define variables using C's int and long keywords; follow the MISRA standard and use the following typedefs to remove all ambiguity and to make porting much simpler:

int8_t   –  8 bit signed integer int16_t  – 16 bit signed integer int32_t  – 32 bit signed integer uint8_t  –  8 bit unsigned integer uint16_t – 16 bit unsigned integer uint32_t – 32 bit unsigned integer

See www.opengroup.org/onlinepubs/009695399/basedefs/stdint.h.html for some interesting extensions to these typedefs for use where performance issues mean we want the compiler to make the smartest decisions possible.

Forming names
To classify organisms, Linnaeus developed a hierarchy that today consists of the kingdom, phylum, class, order, family, genus, and species, and is reflected in biological names such as Homo sapiens . The genus comes first, followed by the more specific, the species. It's a natural way to identify large sets. Start from the general and work toward the specific.

The same goes for variable and function names. They should start with the big and work toward the small. Main_Street_Baltimore_ MD_USA is a lousy name as we're not sure till the very end which huge domain–the country–we're talking about. Better: USA_MD_Baltimore_ Main_Street .

Yet most of the code I read uses names like Read_Timer0() , Read_ UART() , or Read_DMA() . Then there's a corresponding Timer0_ISR() , with maybe Timer0_Initialize() , or Initialize_Timer0() . See a pattern? I sure don't.

Better:

Timer_0_Initialize()Timer_0_ISR()Timer_0_read()

With this practice we've grouped everything to do with Timer 0 together in a logical, Linnaean taxonomy. A sort will clump related names together.

In a sense, however, this taxonomy doesn't reflect English sentence structure. “Timer ” is the object; “read ” the verb, and objects come after the verb. But a name is not a sentence, and we do the best we can do in an imperfect world. German speakers, though, will find the trailing verb familiar.

Since functions usually do something, it's wise to have an action word, a verb, as part of the name. Conversely, variables are just containers and do nothing. Variable names should be nouns, perhaps modified by adjectives.

Avoid weak and nonspecific verbs like “handle,” “process,” and “update.” I have no idea what “ADC_Handle() ” means. “ADC_Curve_Fit() ” conveys much more information.

Short, throwaway variable names are fine occasionally. A single line for loop that uses the not terribly informative index variable “i ” is reasonable if the variable is both used and disposed of in one line. If it carries a value, which implies context and semantics, across more than a single line of code, pick a better name.

TLAs and cheating
In a study of word-abbreviation behavior, researchers M.H. Hodge and F.M. Pennington had subjects abbreviate words.1 Other subjects tried to reconstruct the original words. The average success rate was an appalling 67%.

What does “Disp” mean? Is it the noun meaning the display hardware, or is it the verb “to display?” How about “Calc?” That could be percent calcium, calculate, or calculus.

With two exceptions, never abbreviate a name. Likewise, with the same caveats, never use an acronym. Your jargon may be unknown to some other maintainer or may have some other meaning. Clarity is our goal!

One exception is the use of industry-standard acronyms and abbreviations, such as LED, LCD, CRT, UART, that pose no confusion. Another is that it's fine to use any abbreviation or acronym documented in a dictionary stored in a common header file. For example:

/*  Abbreviation Table* Dsply == Display (the verb)* Disp  == Display (our LCD display)* Tot   == Total* Calc  == Calculation* Val   == Value* MPS   == Meters per second* Pos   == Position*/

I remember with some wonder when my college physics professor taught us to cheat on exams. If you know the answer's units, it's often possible to solve a problem correctly just by properly arranging and canceling those units. Given a result that must be in miles per hour, if the only inputs are 10 miles and 2 hours, without even knowing the question, it's a good bet the answer is 5 MPH.

Conversely, ignoring units is a sure road to disaster. Is Descent_Rate meters per second? CM/sec? Furlongs per fortnight? Sure, the programmer who initially computed the result probably knows, but it's foolish to assume everyone is on the same page. Postfix all physical parameters with the units. Descent_Rate_MPS (note in the dictionary above I defined MPS). Timer_Ticks . ADC_Read_Volts() .

Are there a lot of rules about naming? You betcha. But they come naturally with practice. And practice is what we must be doing for the rest of our careers. Practice new ideas. Practice more effective in-the-code documentation.

For stasis is death.

Endnotes:
1. Hodge, M.H. and F.M. Pennington. “Some Studies of Word Abbreviation Behavior.” Journal of Experimental Psychology , 98(2):350-361, 1973.
Back

Reader Response


“A well placed underscore makes the difference between a s_exchange and a sex_change” — 8048Users Manual, Intel 1977.

Conversely, ignoring units is a sure road to disaster.

An even surer road to disaster, for the industry as a whole is to have teachers like I had for Physics class. I had a teacher that taught the class in the Metric system, using a book that covered only the Metric system (with answers in the back of the book that where usually *wrong*), then gave all of the tests in the English system. Never taught any conversions either. Is it any wounder that most give up and transfer to other majors, or get a bad taste for “Higher Education”?

The physics teacher did drive home how important units can be, and I'll never forget that Kilograms are units of Mass, in the Metric System. Newtons are the unit of wight in the Metric System. Slugs are the unit of Mass in the English system, with the Pound being the unit of wight. Think of that the next time you pick up a package of anything and it says “X Ounces. Y Grams”.

“. . . the idea is appealing: prefix variables with a couple of letters indicating the type,. . . “

You mean appalling? As you end up with hundreds of variables that are all sorted into large groups by size, rather than some meaningfully way. I know you are proponent of Jean Labrosse's largest-to-smallest naming convention, from the Universe down to Atoms, which makes far more sense that Hungarian notation.

I was once sent into clean up a project that was designed by a committee of people spread all over the world. The unit was large moving equipment that if something went wrong, people might die. The unit was composed of several different CPU modules communicating on a property bus. Each modules software was written by a different group in a different part of the world.

The operators requested speed was input in Feet Per Minute.
The output to a Variable Frequency Drive was in tenths of Hertzs.
The tachometer feedback was in RPM, and to top it off all
the internal calculations where done in Radians-Per-Second.

The first thing I did to get the project back on track was to adopt a standardized variable naming convention, that included the units. For example the Operator Request became operator_request_fpm_u16. You then knew immediately you where dealing with Feet Per Minutes, and that it was a 16 bit unsigned variable. After the variable name clean up may of the bugs became self documented, when you saw something like “operator_request_fpm_u16 / vfd_hz_s32” in the code, you knew there was a problem that needed fixed.

I also adopted the convention of adding _v for Volatile, _g for Global, when C forces you down that road, and _vg for a Volatile Global variable. You know that you should scrutinize _vg variables a head of _g variables etc. Enjoy your s_exchange_s32_vg. 🙂

– Bob Paddock


I prefer that physical units be communicated usingthe type of a variable, rather than encoding in the variable's name. A set of typedefs can be laid outfor a project to codify the use of engineering units.For example:

typedef int VOLT_100TH; /* Hundredths of a volt.

Range: -327.68 to +327.67 */

For many embedded projects, twenty or so unitstypedefs are sufficient to cover position, velocity,acceleration, etc.

The typedef serves as a central “definition” pointon which to hang a comment about the units, scaling,range, etc.

Most modern programming editors will show a tool tip with the type when hovering over a variable.

A variant of this approach is to include a suffix indicating the size and sign of the underlying nativeC type. This can net be used as a memory crutch regarding the range and signedness. eg,

typedef unsigned int MPH_UINT16; /* Miles per hour */

However, I'm not particularly fond of this extension.The set of units used in a project is usually small enough that they can be remembered easily. Also, thereare some units that have ranges less than the numericspace available. For instance:

typedef char PERCENT; /* Percent of full scale

Range: – 100% to +100% */

I find that using these forms of typedefs helps to reduce the documentation burden when defining variablesfunction parameters, etc.

– Dave Kellogg


Mr. Ganssle, I totally agree that choosing the right words makes programs better readable and

maintainable, which is very important if you have to fix or change a program somebody else wrote.

“Words matter, as do names” you wrote, and “But name is not a sentence” – so I would prefer to write “Motor start” if I like to have my program starting a motor – instead of Motor_start() followed by a lot of explanations (lines of code) to the compiler and microprocessor to do so –

that fineprint can be done on a lower level.

To be more explicit, I would like to write – per example:

“If bit 3 is true, start Motor”

Why aren't there some intelligent people in this world who develop a computer language which can be programmed using words in whole sentences without all the rubbish which is only for the compiler (which really should be so intelligent to understand these sentences we write to get the task done)? It should be not such a big deal, because “the language recognizes the space character as an end-of-token identifier”, you wrote, and I am sure an intelligent compiler can

easily produce a state of the art optimized code out of our “token” sentences for our embedded system design.

Having such a computer language we really can say “we do the best we can do in an imperfect world.”

I am sure, if such a readable and maintainable computer language is used, even managers can read and understand these programs and programming jobs can stay in the states.

And, because of better readability and maintainability there will be less program flaws and less security issues!

Dirk Bruehl


Don't confuse Systems Hungarian with Apps Hungarian. Systems Hungarian was purportedly a mistake based on poor wording by Simonyi.

While Apps Hungarian may still not be as helpful as some other systems, at least things like “hwnd” to indicate a window handle are informative, and tell you something that the variable type on its own wouldn't. If you followed strict Systems Hungarian, however, you would have to label that as “lng” (or whatever's appropriate for your language), which is, of course, utterly redundant.

In short, what many people think of as “Hungarian” was largely a mistake, furthered primarily by Microsoft. Apps Hungarian was more useful, but unfortunately, the monolith that is Microsoft didn't quite get it.

– Robert Morley


Just got finished reading your July column; instead of

Timer0_Initialize()
Timer0_read()

Etc. I have started using

timer0(READ)
timer0(INIT)
Etc.

And find it much easier to use a switch/case statement inside one functionto handle the different behaviors for timer0. Sort of a poor-man's OO forANSI C.

I also do not capitalize ANYTHING unless it is an industry-standard acronym(e.g. LED, LCD, etc.) or an object declaration such as a struct. Theinstance of the declaration is in lower case, e.g.

struct Table update_table;

All words are separated by an underscore, and all words are spelled out,unless they are acronyms as stated above. I do this because I always knowthe rule, whereas someVariables AreCapitalizeddifferently and I can'tremember WhentoCapitalize and when notTo.

Another handy thing is to put the units in the #define:

#define LOGIN_TIMEOUT 300 // in seconds

These little things have helped me create and maintain code.

As for comments, I ask myself, “will I understand this six months from nowafter working on several other projects in the meantime?”

If the answer is NO, then I add comments until I think I will understand thecode six months from now.

This simple rule has helped me out innumerable times.

Tony Ozrelic


A couple of other thoughts, naming can extend to storage and files aswell. It drives me nuts when people store log files with date part ofthe name as 10032006. Hey! The year comes first, then the month, thenthe day. A prescriptivist may even insist on something of the formY2006M10D03, though I find that hard to swallow, 20061003 works fine,and best of all, it sorts in a logical way, so you can see theprogression of data.

Years ago, when C only saw (see-saw?) the first eight characters assignificant (I'm old) I had to maintain some code where the originalauthors tracked changes by modifying variable and functions names suchas:

parse_tree_1979_07_18_janice(a_thing, something_else,third_arg_1981_02_07_paul);

then later:

parse_tree_1980_03_05_robert(different_1982_05_07_carol,whatever, some_other);

On and on and on! Oh the horrors! Each time they touched a line of code,they added the date of the change in the line and the developers name.By the time I picked it up, compilers were no longer that limited,though you could specify a symbol name limit on the command line.Trouble was, it also broke all the other development tools I had come toknow and love.

This weirdness existed on thousands of lines of code.

So the first thing I had to do was write some awk/sed scripts thatshortened all the symbol names. I only considered the first eightsignificant, and cut the name at the first '_' beyond the eighthcharacter. It took a couple of passes to get my awk/sed scripts to workcorrectly, but when I was done, I had a source tree that worked withcscope, ctags, and lint (oh my).

On another note, I had some friends in another division that worked onY2K issues for an old product and they had one failure that dogged themfor weeks until someone found a two digit year field named 'Bob' in thecode. The original developer for that particular module had left yearsbefore. Thanks Bob.

-Doug Fraser


Regarding the lack of spell-checking functionality in development tools… I use Visual SlickEdit which is a very capable syntax-aware programming editor that includes spell-checking functionality (Check Word at Cursor, Check Selection, Check Comments and Strings, etc.)

– Michael Linden

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.