Advertisement

Enumerations as Counters

November 20, 2002

Dan_Saks-November 20, 2002

Enumerations as Counters
Loops that step through enumerations can be very handy. With a little thought, you can make them clearer and more robust.

I'm rather fond of enumerated types. Enumerations provide a simple yet elegant way to define a set of related symbolic constants as values for a new data type. Using appropriately named enumerated types and constants improves the readability of your code, sometimes dramatically. Moreover, languages such as Ada and C++ treat each enumerated type as distinct from integers and other enumerated types, enabling the compiler to catch a variety of potential errors.

Enumerations

In both C and C++, an enum specifies a type and a corresponding set of named constants. For example:


enum day
  {
  Sunday, Monday, Tuesday,  
  Wednesday, Thursday, Friday, 
  Saturday
  };

defines a type day with seven constants named for each day of the week.

As I explained in my last column ("Tag vs. Type Names," October 2002), C treats day as a tag name. In C, you must refer to the type as enum day unless you define a type name as an alias. I recommend defining a type name alias immediately after the enumeration definition with the same spelling as the tag, as in:


enum day { ... };
typedef enum day day;

C++ treats day as a type, so it lets you refer to the type as just day, even without the typedef. However, the typedef can help avoid some subtle program errors, which I mentioned in my last column.

By default, the first enumeration constant has the value 0, and each subsequent constant has the value that is one more than the value of the previous constant. Thus, Sunday's value is 0, Monday's is 1, and so on.

You can specify an explicit value for any enumeration constant. Any enumeration constant without an explicitly specified value has the value one more than the value of the previous constant. So:


enum month
  {
  January = 1, February, March,
  ...
  October, November, December
  };

defines January's value as 1, February's as 2, and so on. December's value is 12.

Enumerations as counters in C

When you use an enumeration to represent an ordered, contiguous sequence of values, such as days of the week, there's a good chance you'll be writing loops that step through those values.

In C, each enumeration type is compatible with char or some signed or unsigned integer type. You can perform arithmetic on enumerations, including ++ and --, and thus easily write loops that step through the days as:


for (d = Sunday; d <= Saturday; ++d)
  ...

In C, it really doesn't matter whether you declare d as a day or an int, but using day is preferable because it's self-documenting.

Enumerations as counters in C++

In C++, each enumeration is a distinct type. When you perform arithmetic on an enumeration value, the compiler automatically converts it to its underlying integer type. It does not convert it back automatically. So, if d is a day, d + 1 yields an int, not a day. Thus:

d = d + 1; // error

is an error, because the compiler won't convert d + 1 from int to day automatically. Furthermore, ++d is also an error. The built-in ++ and -- operators in C++ do not accept enumeration values as arguments.

Incrementing an enumeration requires a cast to convert the integer result of addition back to the enumeration type, as in:

d = day(d + 1);

This seems to leave you with two less-than-perfect alternatives for writing loops that iterate over the day values.

The first alternative is to use an int, rather than a day, as the loop counter:


int d;
for (d = Sunday; d <= Saturday; ++d)
  ...

Unfortunately, inside the body of the loop, you can use d as a day only if you cast it back to type day. This leads to haphazard use of casts, which, in turn, invites programming errors.

The second alternative is to use a day as the counter, but with an awkward looking increment expression that uses a cast, as in:


for (d = Sunday; d <= Saturday; 
  d = day(d + 1))
	...

Fortunately, there's a third alternative: use a day as a counter along with an appropriate programmer-defined ++ operator. C++ lets you define ++ and -- as either prefix or postfix operators for any enumerated type. You can define the prefix ++ operator for day as:


inline
day &operator++(day &d)
  {
  return d = day(d + 1);
  }

This wraps the awkward cast and assignment into a tidy package. Then you can write loops that iterate over the day values as in C:


day d;
...
for (d = Sunday; d <= Saturday; ++d)
  ...

with C++'s stricter type checking in force.

Here, the compiler translates ++d into the function call operator++(d). Since the function is declared inline, any decent optimizing compiler will strip away the call and substitute the function body in its place. The end result should be the same code that you would get from compiling ++d in C.

One too far

What's the value of d after the loop terminates? It's 7, one more than Saturday. In C, there's absolutely no problem with this because 7 fits easily into the enumeration's underlying type, which, again, is char or an integer type. As you might expect, C++ is a bit pickier.

According to the C++ Standard:

A value of integral type can be explicitly converted to an enumeration type. The value is unchanged if the integral value is within the range of the enumeration values. Otherwise, the resulting enumeration value is unspecified.

The cast expression day(d + 1) converts an int to a day. When d is Saturday, d + 1 is outside the range of day values, and day(d + 1) yields an unspecified result. I've never seen a compiler change the underlying value, but the Standard warns that it can happen.

Your best bet is to include in the range of enumeration values a value that represents "one beyond the end." For example:


enum day
  {
  Sunday, Monday, Tuesday,  
  Wednesday, Thursday, Friday, 
  Saturday, not_a_day
  };

Knowing where to start and stop

When you deal with types representing everyday phenomena, like days or months, it's easy to remember which is the first (lowest) value and which is the last (highest) value. Such enumerations tend to be relatively immune to change. I think writing a loop such as:

for (m = January; m <= December; ++m)

is a safe bet to work for years to come.

But what about enumerations where the ordering is arbitrary or subject to change? How do you remember which are the first and last values here:


enum currency
  {
  CAD, DEM, EUR, FRF, GBP, JPY, USD
  };

and how can you write loops that continue to work even if you add, remove, or reorder the enumeration constants?

Ada has a nice solution to this problem in the form of predefined attributes 'first and 'last, which let you write:


for c in currency'first..
  currency'last loop

or just:

for c in currency loop

C and C++ have no such predefined attributes. You have to adopt a discipline of defining additional symbols to specify the range of each enumeration.

You might consider following the pattern set in the standard header <limits.h>, which defines constants such as INT_MIN and INT_MAX as the range for type int. (If you haven't looked at this header, you should. There's good stuff in there.) For example, you can define type currency as:


enum currency
  {
  currency_min,
  CAD = currency_min,
  DEM, EUR, FRF, GBP, JPY, USD,
  currency_max = USD,
  not_a_currency
  };

Then a loop that counts through the currency values looks like:


for (c = currency_min; c <= 
  currency_max; ++c)
	...

Whenever you modify the set of enumeration constants, just make sure that currency_min and currency_max remain the minimum and maximum values, respectively. If you do that, the loop will continue to do what it's supposed to do.

More to come

Not every enumerated type has constants with strictly contiguous values. For some enumerations, you might want to loop down as well as up. I'll look at these complications next time.

Dan Saks is the president of Saks & Associates, a C/C++ training and consulting company. You can write to him at dsaks@wittenberg.edu.

Loading comments...