Advertisement

A sign of confusion

February 08, 2008

Dan_Saks-February 08, 2008

In C and C++, the unusual nature of char leaves many programmers puzzled about when to use plain char in preference to an explicitly signed or unsigned char.

All of the integer types in C and C++ come in signed and unsigned variants. In all cases but one, the signed variant is the default. For instance, the type specifier int is short for signed int, and long int is short for signed long int. The exception is the char types.

The plain char type has the same representation and behavior as either signed char or unsigned char, but plain char is nonetheless a distinct type. For example, even with a compiler that implements plain char the same as signed char, the following pointer assignment is an error:

char *pc;
signed char *psc;
...
pc = psc;           // invalid conversion

Many compilers tolerate this conversion, but the language standards consider it to be an error.

The unusual nature of char--that it's distinct from its signed and unsigned cousins, but not completely so--leaves many programmers puzzled about when to use plain char in preference to an explicitly signed or unsigned char. Too often, programmers guess wrong, and find themselves compounding the error by using casts. The following letter from a reader typifies the problem:

Recently I faced a problem where I was using an object declared as:

signed char *ptr;

I tried to do something such as:

if (ptr[0] == 0xFF)

Using the debugger, I could see that ptr[0] always had the value 0xFF but the condition in the if-statement was always false. When I looked at the disassembled code, the register containing ptr[0] 's value showed 0xffffffff.

I solved the problem by casting ptr[0] to unsigned char. Though I got the expression to evaluate to true, I'm not quite sure how it works.

As I've explained in past columns1,2, using a cast is often an indication that you're doing something wrong. That's the case here.

Here's what's happening with that conditional expression. The left operand, ptr[0] , is a signed char. On a typical machine with 8-bit bytes and twos-complement arithmetic, a signed char has values in the range -128 to +127. If ptr[0] contains 0xFF, the decimal arithmetic value of ptr[0] is -1, not 255.

The right operand in the conditional expression, the literal 0xFF, is an int, or more precisely, a signed int.3 It's not a signed char. As a signed int, 0xFF has the value 255 (decimal).

According to the standard, when an expression compares a signed char with a signed int, the program promotes the signed char to signed int prior to doing the compare. The resulting signed int has the same value as the signed char, which in this case is -1. On a 32-bit twos-complement machine, -1 (decimal) is represented as 0xFFFFFFFF.

In short, ptr[0] is a signed char whose value is -1, and 0xFF is a signed integer whose value is 255, and their values are not equal.

The way to avoid such surprising behavior is to use objects and literals whose types can be combined safely without explicit conversions. For example, when you test the value of a plain char, you should compare it with another plain char or character constant, not with an int. For example, I'd replace:

signed char *ptr;
...
if (ptr[0] == 0xFF)

with:

char *ptr;
...
if (ptr[0] == '\xFF') 

The latter works correctly in C or C++ whether plain char is implemented as signed or unsigned.

In truth, character literals such as '\xFF' have type int in C. In C, the conditional expression in:

if (ptr[0] == '\xFF') 

actually compares a plain char to an int. The compiler promotes the left operand to int to match the right operand. Nonetheless, the comparison works correctly in C without casting because the compiler uses the same rule to promote a plain char to an int that it uses to obtain the integer value of a character literal.

Endnotes:
1. Saks, Dan, "Cast with caution," Embedded Systems Design, July 2006, p. 15.
Back
2. Saks, Dan, "A case study in portability," Embedded.com, November 2007.
Back
3. Saks, Dan, "Numeric Literals," Embedded Systems Programming, September 2000, p. 113.
Back

Loading comments...