Advertisement

Perfecting naming conventions

July 01, 2007

Jack Ganssle-July 01, 2007

There are some 7,000 languages used today on this planet, suggesting a veritable Babel of poor communication.

'Tis but thy name that is my enemy;
Thou art thyself, though not a Montague.
What's Montague? It is nor hand, nor foot,
Nor arm, nor face, nor any other part
Belonging to a man. O, be some other name!
What's in a name? That which we call a rose
By any other name would smell as sweet;
So Romeo would, were he not Romeo call'd,
Retain that dear perfection which he owes
Without that title. Romeo, doff thy name,
And for that name which is no part of thee
Take all myself.

--Romeo and Juliet, act II, scene II, William Shakespeare

Maybe to Juliet names were fungible, but names and words matter. Biblical scholars refute attacks on scripture by exhaustive analysis of the meaning of a single Greek or Aramaic word, whose nuance may have changed in the intervening millennia, corrupting a particular translation.

In zoology, the binomial nomenclature, originally invented by Carl Linnaeus (born 300 years ago this year), rigorously specifies how species are named. Genus names are always capitalized while the species name never is. That's the standardized way zoologists communicate. Break the standard and you're no longer speaking the language of science.

Names are so important, there's an entire science, called onomatology, devoted to their use and classification.

In the computer business, names require a level of precision that's unprecedented in the annals of human history. For instance, Motor_start() and motor_start() are as different as the word for "hair" in Urdu and Esperanto. Mix up "l" and "1" or "0" and "O" and you might as well be babbling in Babylonian. Yet, depending on the compiler implementation, this_is _a_really_long_variable_name and this_is_a_really_long _variable_name_complete_nonsense are identical.

Yet we still use "i," "ii," and "iii" (my personal favorite) for index variables. You have to admire anyone devoted to his family, but that's no excuse for the too-common practice of using a spouse's or kid's name in a variable declaration.

Words matter, as do names. Don't call me "Dave." I won't respond. Don't call a variable foobar. It conveys nothing to a future maintainer. Great code requires a disciplined approach to naming.

Conventions
There are some 7,000 languages used today on this planet, suggesting a veritable Babel of poor communication. But only about 9 are spoken by more than 100 million people; 69 are known by 10 million or more (See the "Other estimate" column, http://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers). The top ranks are disputed, but speakers of Mandarin, Spanish, English, and perhaps Hindi far outnumber those for any other language. C itself is composed entirely of English words like "if," "while," and "for," though in many companies programmers comment in their native language. This mix can't lead to clarity, which is the overarching goal of naming and documenting.

The global economy means, for better or worse, many companies that do all their work in-house will be outsourcing and offshoring parts of the effort in the future. Sites such as Rentacoder.com, which encourages programmers from a vast number of countries to compete for work, are a harbinger of the future. A common lingo is needed to ease communication among so many cultures, and the default is clearly English. That may change, just as French was once the lingua franca of diplomacy and German that for science. So at the risk of sounding Anglocentric, I think it's clear that before too long most of us will be required to develop code and comments in English. Names, therefore, should use English words.

Spelling matters. Misspelled words are a sign of sloppy work. Our crummy tools, though, don't do any sort of spell checking, even though programmers have given the rest of the world fabulous tools that immediately flag a misspelled word. Invariably a spelling error will creep in from time to time. When discovered, fix it. It's nearly impossible to maintain code littered with these sorts of mistakes, as now the developer has to remember the oxymoronic "correct misspelling" to use.

Long names are a great way to convey meaning, but C99 requires that only the first 31 and 63 identifiers to be significant for external and internal names, respectively. Restrict all names to 31 characters or less.

Don't redefine a name using C's scoping rules. Though legal, having two names with different meanings is confusing. Similarly, don't use names that differ only in case.

On the subject of case, it's pretty traditional to define macros and constants in uppercase while using a mix of cases for functions and variable names. That seems reasonable to me. But what about camel case? Or should I write that CamelCase? Or is it camelCase? Everyone has a different opinion. But camel case is merely an awkward way to simulate a space between words, which gets even more cryptic when using acronyms: UARTRead. Some advocate only capitalizing the first letter of an acronym, but that word-izes the acronym, torturing the language even more.

We really, really want to use a space, but the language recognizes the space character as an end-of-token identifier. The closest typographical character to space is underscore, so why not use that? This_is_a_word is, in my opinion, easier to grok while furiously scanning hundreds of pages of code than ThisIsAWord. Underscore's one downside is that it eats away at the 31-character name-size limit, but that rarely causes a problem.

Types
Developers have argued passionately both for and against Hungarian notation since it was first invented in the '70s by space tourist Charles Simonyi. At first blush the idea is appealing: prefix variables with a couple of letters indicating the type, increasing the name's information density. Smitten by the idea years ago, I drank the Hungarian Kool-Aid.

In practice, Hungarian makes the code ugly. Clean names get mangled. szString means "String" is zero-terminated. uiData flags an unsigned int. Then I found that when changing the code (after all, everything changes all the time) sometimes an int had to morph to a long, which meant editing every invocation of the name. One team I know avoids this problem by typedefing a name like iName to long, which means not only is the code ugly, but the Hungarian nomenclature lies to the unwary.

C types are problematic. Is an int 16 bits? 32? Don't define variables using C's int and long keywords; follow the MISRA standard and use the following typedefs to remove all ambiguity and to make porting much simpler:


int8_t   –  8 bit signed integer 
int16_t  – 16 bit signed integer 
int32_t  – 32 bit signed integer 
uint8_t  –  8 bit unsigned integer 
uint16_t – 16 bit unsigned integer 
uint32_t – 32 bit unsigned integer

See www.opengroup.org/onlinepubs/009695399/basedefs/stdint.h.html for some interesting extensions to these typedefs for use where performance issues mean we want the compiler to make the smartest decisions possible.

< Previous
Page 1 of 2
Next >

Loading comments...