Advertisement

Well-Behaved Enumerations

March 19, 2003

Dan_Saks-March 19, 2003

Well-Behaved Enumerations
Whether you use enumerations to count up or down, be careful at the boundaries. Values just beyond the ends must have valid representation.

In my last column (December 2002, p. 36), I discussed using variables of enumerated types as loop counters. At the time, I limited the discussion to simple cases where the loops count up through enumeration constants with contiguous values. This month, I'll expand the discussion to consider loops that count down.

Type checking
In both C and C++, an enumeration definition specifies a type and a corresponding set of named constants. For example:


enum day
  {
  Sunday, Monday, Tuesday, Wednesday, 
  Thursday, Friday, Saturday
  };
typedef enum day day;

defines a type day with seven constants, one for each day of the week. The typedef immediately after the enumeration definition elevates the name day from a mere tag to a full-fledged type name.

In C, each enumeration type is compatible with char or some signed or unsigned integer type. You can perform arithmetic on enumerations, including ++ and --, and thus easily write loops that step through the days as:


for (d = Sunday; d <= Saturday; ++d)
  ...

In C, it really doesn't matter whether you declare d as a day or an int. In fact, Standard C doesn't even care if you declare d as some other enumerated type such as month. Really. Static checkers such as PC-Lint as well as some compilers will issue a warning if you assign a value of one enumeration type to a variable of a different enumeration, but most compilers will just let it slide. Nonetheless, declaring d as a day for the loop above is the right thing to do because Sunday through Saturday is a sequence of day values.

In contrast to C, C++ regards each enumeration as a distinct type. If you were to write the previous loop with a control variable of type month, as in:


month d;
for (d = Sunday; d <= Saturday; ++d)
  ...

all C++ compilers would complain that you can't assign Sunday (a day) to d (a month). You must declare d as either some integer type or as a day. Once again, day is better.

If you declare d as a day, you must also define a prefix ++ operator, so the loop's increment expression ++d will compile. As I explained last time, you can define the prefix ++ operator for day as:


inline
day &operator++(day &d)
  {
  return d = day(d + 1);
  }

Unfortunately, even C++ doesn't enforce the distinction between enumerations as much as you might like. For example, given type day as defined earlier and:


enum month
  {
  January = 1, February, ..., December
  };
typedef enum month month;

most C++ compilers will accept:


day d;
for (d = Sunday; d <= December; ++d)
  ...

without complaint, even though December is not a day. C++ quietly converts enumeration values to integers. For the <= operator, the compiler converts both d (a day) and December (a month) to int, and compares the resulting values.

The results of such comparisons are almost certainly meaningless. Yet few C++ compilers complain about comparisons between enumeration values of different types. Static analyzers, such as PC-Lint, do.

One too far, revisited
In many cases where a loop counter steps through a range of values, the loop terminates with the counter set to a value that's "one beyond" the value of the counter for that last iteration. In the case of:

for (d = Sunday; d <= Saturday; ++d)
  ...

the loop terminates with d equal to the unnamed day whose integer value is 7, one more than Saturday.

As I explained last time, C has no problem coping with a day value one greater than Saturday because it still fits into the enumeration's underlying type, which, again, is char or an integer type. In practice, C++ handles "one beyond Saturday" just as well as C does, but the C++ standard doesn't really sanction such behavior. The safe bet is to include in each enumeration definition a value that represents one beyond the last meaningful value, as in:


enum day
  {
  Sunday, Monday, Tuesday, 
  Wednesday, Thursday, Friday,
  Saturday, not_a_day
  };

This is a robust approach that avoids problems at boundary cases, such as when an enumerated type has nonnegative values and a maximum value of 255, as in:


enum day
  {
  Sunday = 249, Monday, Tuesday,
  Wednesday, Thursday, Friday,
  Saturday
  };

In this case, the behavior of:


for (d = Sunday; d <= Saturday; ++d)
  ...
  

is implementation dependent. In particular, it depends on how the compiler elects to represent the enumerated type.

If the compiler stores a day as an integer (signed or unsigned), then the loop behaves as expected. Namely, it loops once for each day value and terminates with d equal to one beyond Saturday (256). However, if the target machine uses 8-bit characters and the compiler stores a day as an unsigned char, then the loop runs forever. In this case, when d is Saturday, ++d yields the day whose value is zero.

More precisely, when d is
Saturday, ++d yields the day whose value is 256, which exceeds the capacity of an 8-bit unsigned char. In both C and C++, unsigned arithmetic is modular-unsigned values that overflow wrap around to zero. Thus, d will always be less than or equal to Saturday, and the loop's conditional expression will always be true. The loop runs forever.

Once again, you can avoid these sorts of boundary problems by defining an additional enumeration constant for the value one beyond the last meaningful value. Be aware that this style may incur a small price in performance. Specifically, expanding the ranges of enumeration values may increase the storage allocated to some enumeration objects.

Counting down
A loop that counts down to the lowest enumeration value poses a boundary problem similar to the one just discussed. For example, given:


enum day
  {
  Sunday, Monday, Tuesday,
  Wednesday, Thursday, Friday,
  Saturday, not_a_day
  };

then the loop:


day d;
for (d = Saturday; d >= Sunday; --d)
  ...

might run forever. Since all the values of type day are nonnegative, the compiler could choose to implement day as an unsigned integer type, which includes unsigned char as well as the various sizes of unsigned int. If day is represented as an unsigned type, then d >= Sunday is always true, and the loop never terminates.

You can get the loop to stop when it should by fiddling with the timing of the decrement. You also have to fudge the starting value for the counter:


day d;
for (d = not_a_day; d-- >= Sunday;)
  ...

The resulting loop is not nearly as clear as the original.

A better solution to this problem is to define an enumeration constant representing "one beyond" the last meaningful value. However, in this case, we're counting down, so "beyond" is actually "before." I have been using not_a_day as the "beyond" value, so I can't very well use that as the "before" value as well.

I recommend selecting a pair of symmetric names for the values beyond each of the enumeration range, such as day_before and day_after, so the enumeration definition looks like:


enum day
  {
  day_before = -1,
  Sunday, Monday, Tuesday, 
  Wednesday, Thursday, Friday,
  Saturday, day_after
  };

Thoughts on naming
If _before and _after don't suit your sense of aesthetics, I'm sure you can find other naming pairs that will work, such as day_below and day_above, or even day_underflow and day_overflow. As always, your best bet is to choose something you can use consistently.

Last time, I suggested defining _min and _max symbols to represent each end of the range of enumeration values. For example, defining type day as:


enum day
  {
  day_before = -1,
  day_min,
  Sunday = day_min, Monday,
  Tuesday, Wednesday, Thursday, 
  Friday, Saturday,
  day_max = Saturday,
  day_after
  };

lets you write loops such as:


day d;
for (d = day_min; d <= day_max; ++d)
  ...

which counts up through the days without regard for whether the first day is Sunday or Monday (or maybe even some other day).

I would use names with min and max in addition to, not in place of, names with before and after. The convention for min and max established in the C header <limits.h> and carried through in the C++ header <limits> is that min and max values are in the range of values for the relevant type. The before and after values are outside the range of values.

More to come
I have yet to address the problems of iterating through enumeration types whose values are not contiguous. That'll come soon.

Dan Saks is the president of Saks & Associates, a C/C++ training and consulting company. You can write to him at dsaks@wittenberg.edu.

Loading comments...