Advertisement

Further insights into size_t

September 01, 2007

Dan_Saks-September 01, 2007

Using size_t may be awkward for some programmers, but using it still solves more problems than it creates.

In my previous column, I explained why both the C and C++ standard libraries define a typedef named size_t and how you should use that type in your programs.1 That article generated quite a few interesting questions and comments, some of which I'd like to share with you.

Is size_t really unsigned?
One diligent reader noticed that his compiler didn't implement size_t as I said it should:

Either your article contains an error or gcc (at least the versions I've used) contains an error.

gcc actually defines size_t as a signed integer type. This means that using size_t rather than an explicit integer type actually *creates* portability annoyance when code is used both with gcc and with a compiler that defines size_t as an unsigned integer type. Most of these annoyances come from sloppy casts of constants or variables to (unsigned) or (unsigned long) rather than size_t to silence warnings about comparisons between signed and unsigned values. Such casts silence one compiler, but ensure the same signed vs. unsigned mismatch when using other compilers. . . .

I've not checked the standard, or the latest versions of gcc, but this has created difficulties for me and my colleagues in actual practice.

According to the 1999 C Standard, size_t is clearly supposed to be unsigned.2 In clause 7.17, Common definitions <stddef.h>, it says:

The following types and macros are defined in the standard header <stddef.h>. Some are also defined in other headers, as noted in their respective subclauses.

The types are . . .

size_t

which is the unsigned integer type of the result of the sizeof operator;

size_t is unsigned in every compiler I tested, including gcc. I'm using a build based on gcc 3.2.3.

I poked around on the web and found some old GNU C Library maintenance documentation at www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_30.html, which states:

There is a potential problem with the size_t type and versions of GCC prior to release 2.4. ANSI C requires that size_t always be an unsigned type . . .

That documentation provides additional insights into gcc's handling of size_t and how you can tweak it.

Portability concerns
I disagree with the reader's claim that " . . . using size_t rather than an explicit integer type actually *creates* portability annoyance when code is used both with gcc and with a compiler that defines size_t as an unsigned integer type." I presume that "explicit integer type" means an integer type specified by keywords such as int or unsigned, as opposed to a typedef such as size_t.

Using size_t properly may create some annoyances, but it eliminates many more than it creates. For example, the standard strlen function returns a size_t. Code such as:

char *s;
size_t len;
...
len = strlen(s);

will compile without complaint--and work--whether the library defines size_t as signed or as unsigned.

In contrast, declaring len explicitly as either int or unsigned is much more likely to cause portability problems. Specifically, if you declare len as:

int len;

then the assignment:

len = strlen(s);

may provoke type mismatch warnings (an unsigned to signed conversion) when compiled with a library that defines size_t (properly) as unsigned. Similarly, if you declare len as:

unsigned len;

then the same assignment will likely generate warnings when compiled with a library that defines size_t (improperly) as signed. Using size_t actually insulates your code against failure even when using a compiler and library that define size_t incorrectly. All the more reason to use size_t.

"Sloppy" vs. "clean" casts
The reader observed that "Most of these annoyances come from sloppy casts of constants or variables to (unsigned) or (unsigned long) rather than (size_t) to silence warnings about comparisons between signed and unsigned values." This is not so much an argument against using size_t appropriately as it is an acknowledgment of the consequences of using size_t inappropriately.

For example, consider:

int n;
...
if (strlen(s) > n)
    ...

Some compilers will issue a warning that the expression in the if-statement is comparing an unsigned value (the size_t returned by strlen) to a signed value (the int in n). These warnings are often helpful in catching potential logic errors, and I advise you to leave them turned on. The best way to silence the warning is fix the problem by changing the declaration of n so that it has type size_t. If that's infeasible (probably for political rather than technical reasons), then your only recourse is to use a cast. Writing:

if (strlen(s) > (unsigned)n)

will quell the compiler, but this cast is arguably "sloppy". Neither n nor the return type of strlen is declared as unsigned, so why cast to that? A cleaner approach would be either to cast one operand to the declared type of the other.

Type size_t might be an alias for unsigned long. In that case, casting a size_t to int, as in:

if ((int)strlen(s) > n)

could truncate the size_t value. It would be bad.3 Writing:

if (strlen(s) > (size_t)n)

would be better. This works correctly even if the library incorrectly defines size_t as a signed integer.

< Previous
Page 1 of 2
Next >

Loading comments...