Further insights into size_t
Using size_t may be awkward for some programmers, but using it still solves more problems than it creates.
In my previous column, I explained why both the C and C++ standard libraries define a typedef named size_t and how you should use that type in your programs.1 That article generated quite a few interesting questions and comments, some of which I'd like to share with you.
Is size_t really unsigned?
One diligent reader noticed that his compiler didn't implement size_t as I said it should:
Either your article contains an error or gcc (at least the versions I've used) contains an error.
gcc actually defines size_t as a signed integer type. This means that using size_t rather than an explicit integer type actually *creates* portability annoyance when code is used both with gcc and with a compiler that defines size_t as an unsigned integer type. Most of these annoyances come from sloppy casts of constants or variables to (unsigned) or (unsigned long) rather than size_t to silence warnings about comparisons between signed and unsigned values. Such casts silence one compiler, but ensure the same signed vs. unsigned mismatch when using other compilers. . . .
I've not checked the standard, or the latest versions of gcc, but this has created difficulties for me and my colleagues in actual practice.
According to the 1999 C Standard, size_t is clearly supposed to be unsigned.2 In clause 7.17, Common definitions
The following types and macros are defined in the standard header
. Some are also defined in other headers, as noted in their respective subclauses. The types are . . .
size_t
which is the unsigned integer type of the result of the sizeof operator;
size_t is unsigned in every compiler I tested, including gcc . I'm using a build based on gcc 3.2.3.
I poked around on the web and found some old GNU C Library maintenance documentation at www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_30.html, which states:
There is a potential problem with the size_t type and versions of GCC prior to release 2.4. ANSI C requires that size_t always be an unsigned type . . .
That documentation provides additional insights into gcc 's handling of size_t and how you can tweak it.
Portability concerns
I disagree with the reader's claim that ” . . . using size_t rather than an explicit integer type actually *creates* portability annoyance when code is used both with gcc and with a compiler that defines size_t as an unsigned integer type.” I presume that “explicit integer type” means an integer type specified by keywords such as int or unsigned , as opposed to a typedef such as size_t .
Using size_t properly may create some annoyances, but it eliminates many more than it creates. For example, the standard strlen function returns a size_t . Code such as:
char *s;size_t len;...len = strlen(s);
will compile without complaint–and work–whether the library defines size_t as signed or as unsigned.
In contrast, declaring len explicitly as either int or unsigned is much more likely to cause portability problems. Specifically, if you declare len as:
int len;
then the assignment:
len = strlen(s);
may provoke type mismatch warnings (an unsigned to signed conversion) when compiled with a library that defines size_t (properly) as unsigned. Similarly, if you declare len as:
unsigned len;
then the same assignment will likely generate warnings when compiled with a library that defines size_t (improperly) as signed. Using size_t actually insulates your code against failure even when using a compiler and library that define size_t incorrectly. All the more reason to use size_t .
“Sloppy” vs. “clean” casts
The reader observed that “Most of these annoyances come from sloppy casts of constants or variables to (unsigned) or (unsigned long) rather than (size_t) to silence warnings about comparisons between signed and unsigned values.” This is not so much an argument against using size_t appropriately as it is an acknowledgment of the consequences of using size_t inappropriately.
For example, consider:
int n;...if (strlen(s) > n) ...
Some compilers will issue a warning that the expression in the if-statement is comparing an unsigned value (the size_t returned by strlen ) to a signed value (the int in n ). These warnings are often helpful in catching potential logic errors, and I advise you to leave them turned on. The best way to silence the warning is fix the problem by changing the declaration of n so that it has type size_t . If that's infeasible (probably for political rather than technical reasons), then your only recourse is to use a cast. Writing:
if (strlen(s) > (unsigned)n)
will quell the compiler, but this cast is arguably “sloppy”. Neither n nor the return type of strlen is declared as unsigned , so why cast to that? A cleaner approach would be either to cast one operand to the declared type of the other.
Type size_t might be an alias for unsigned long . In that case, casting a size_t to int , as in:
if ((int)strlen(s) > n)
could truncate the size_t value. It would be bad.3 Writing:
if (strlen(s) > (size_t)n)
would be better. This works correctly even if the library incorrectly defines size_t as a signed integer.
Printing size_t objects
Another reader wrote and asked which format specifier to use to display a size_t object using printf .
A printf format string may contain conversion specifiers, such as %d for displaying a (signed) int , and %u for displaying an unsigned int . The specifier may include a length modifier, such as h for short , and l (el) for long . For example, %hd is the specifier for a short int , and %lu is the specifier for an unsigned long , or equivalently long unsigned .
According to the 1999 C Standard, you should use the z length modifier with the u conversion specifier to display a size_t object, as in:
size_t n;...printf("%zu", n);
Although the u modifier has been around since well before the 1989 C Standard, the z modifier is fairly new. Of the four C compilers I have installed on my machine, none supports the z modifier. I suspect few compilers do, yet.
If your compiler doesn't support %zu , then you should try %lu (unsigned long), as in:
size_t n;unsigned long ul;...ul = n;printf("%lu", n);
or:
size_t n;...printf("%lu", (unsigned long)n);
Again, size_t is an alias for either unsigned or unsigned long , so converting a size_t to an unsigned long produces an unsigned that's either the same size as a size_t , or wider. It won't lose significance.
Using just:
size_t n;...printf("%lu", n);
produces undefined behavior if sizeof(unsigned) is less than sizeof(unsigned long) . It, too, would be bad.
By the way, displaying a size_t using the C++ iostream library is exceedingly simple:
size_t n;...std::cout << n;
Until next time, keep the questions coming.
Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. For more information about Dan Saks, visit his website at www.dansaks.com. Dan also welcomes your feedback: e-mail him at . For more information about Dan
Endnotes:
1. Saks, Dan. “Why size_t matters,” Embedded Systems Design , July 2007, p. 11.
2. International Organization for Standardization, ISO/IEC 9899:1999: Programming languages — C, Geneva, Switzerland, 1999.
3. Spengler, Egon. “Don't cross the streams.” See also Stantz, Ray, “Total protonic reversal.”
This article was very nice. It is very useful. As I am new to the embedded software field, it gave an idea on how the functions are made source-level portable.
Thanks,
-Mohamed Thalib
GDA Technologies
Chennai, India