Further insights into size_t - Embedded.com

Further insights into size_t

Using size_t may be awkward for some programmers, but using it still solves more problems than it creates.

In my previous column, I explained why both the C and C++ standard libraries define a typedef named size_t and how you should use that type in your programs.1 That article generated quite a few interesting questions and comments, some of which I'd like to share with you.

Is size_t really unsigned?
One diligent reader noticed that his compiler didn't implement size_t as I said it should:

Either your article contains an error or gcc (at least the versions I've used) contains an error.

gcc actually defines size_t as a signed integer type. This means that using size_t rather than an explicit integer type actually *creates* portability annoyance when code is used both with gcc and with a compiler that defines size_t as an unsigned integer type. Most of these annoyances come from sloppy casts of constants or variables to (unsigned) or (unsigned long) rather than size_t to silence warnings about comparisons between signed and unsigned values. Such casts silence one compiler, but ensure the same signed vs. unsigned mismatch when using other compilers. . . .

I've not checked the standard, or the latest versions of gcc, but this has created difficulties for me and my colleagues in actual practice.

According to the 1999 C Standard, size_t is clearly supposed to be unsigned.2 In clause 7.17, Common definitions , it says:

The following types and macros are defined in the standard header . Some are also defined in other headers, as noted in their respective subclauses.

The types are . . .

size_t

which is the unsigned integer type of the result of the sizeof operator;

size_t is unsigned in every compiler I tested, including gcc . I'm using a build based on gcc 3.2.3.

I poked around on the web and found some old GNU C Library maintenance documentation at www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_30.html, which states:

There is a potential problem with the size_t type and versions of GCC prior to release 2.4. ANSI C requires that size_t always be an unsigned type . . .

That documentation provides additional insights into gcc 's handling of size_t and how you can tweak it.

Portability concerns
I disagree with the reader's claim that ” . . . using size_t rather than an explicit integer type actually *creates* portability annoyance when code is used both with gcc and with a compiler that defines size_t as an unsigned integer type.” I presume that “explicit integer type” means an integer type specified by keywords such as int or unsigned , as opposed to a typedef such as size_t .

Using size_t properly may create some annoyances, but it eliminates many more than it creates. For example, the standard strlen function returns a size_t . Code such as:

char *s;size_t len;...len = strlen(s); 

will compile without complaint–and work–whether the library defines size_t as signed or as unsigned.

In contrast, declaring len explicitly as either int or unsigned is much more likely to cause portability problems. Specifically, if you declare len as:

int len;   

then the assignment:

len = strlen(s);   

may provoke type mismatch warnings (an unsigned to signed conversion) when compiled with a library that defines size_t (properly) as unsigned. Similarly, if you declare len as:

unsigned len;   

then the same assignment will likely generate warnings when compiled with a library that defines size_t (improperly) as signed. Using size_t actually insulates your code against failure even when using a compiler and library that define size_t incorrectly. All the more reason to use size_t .

“Sloppy” vs. “clean” casts
The reader observed that “Most of these annoyances come from sloppy casts of constants or variables to (unsigned) or (unsigned long) rather than (size_t) to silence warnings about comparisons between signed and unsigned values.” This is not so much an argument against using size_t appropriately as it is an acknowledgment of the consequences of using size_t inappropriately.

For example, consider:

int n;...if (strlen(s) > n)    ... 

Some compilers will issue a warning that the expression in the if-statement is comparing an unsigned value (the size_t returned by strlen ) to a signed value (the int in n ). These warnings are often helpful in catching potential logic errors, and I advise you to leave them turned on. The best way to silence the warning is fix the problem by changing the declaration of n so that it has type size_t . If that's infeasible (probably for political rather than technical reasons), then your only recourse is to use a cast. Writing:

if (strlen(s) > (unsigned)n)   

will quell the compiler, but this cast is arguably “sloppy”. Neither n nor the return type of strlen is declared as unsigned , so why cast to that? A cleaner approach would be either to cast one operand to the declared type of the other.

Type size_t might be an alias for unsigned long . In that case, casting a size_t to int , as in:

if ((int)strlen(s) > n)   

could truncate the size_t value. It would be bad.3 Writing:

if (strlen(s) > (size_t)n)   

would be better. This works correctly even if the library incorrectly defines size_t as a signed integer.

Printing size_t objects
Another reader wrote and asked which format specifier to use to display a size_t object using printf .

A printf format string may contain conversion specifiers, such as %d for displaying a (signed) int , and %u for displaying an unsigned int . The specifier may include a length modifier, such as h for short , and l (el) for long . For example, %hd is the specifier for a short int , and %lu is the specifier for an unsigned long , or equivalently long unsigned .

According to the 1999 C Standard, you should use the z length modifier with the u conversion specifier to display a size_t object, as in:

size_t n;...printf("%zu", n);   

Although the u modifier has been around since well before the 1989 C Standard, the z modifier is fairly new. Of the four C compilers I have installed on my machine, none supports the z modifier. I suspect few compilers do, yet.

If your compiler doesn't support %zu , then you should try %lu (unsigned long), as in:

size_t n;unsigned long ul;...ul = n;printf("%lu", n);   

or:

size_t n;...printf("%lu", (unsigned long)n);   

Again, size_t is an alias for either unsigned or unsigned long , so converting a size_t to an unsigned long produces an unsigned that's either the same size as a size_t , or wider. It won't lose significance.

Using just:

size_t n;...printf("%lu", n);   

produces undefined behavior if sizeof(unsigned) is less than sizeof(unsigned long) . It, too, would be bad.

By the way, displaying a size_t using the C++ iostream library is exceedingly simple:

size_t n;...std::cout << n;   

Until next time, keep the questions coming.

Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. For more information about Dan Saks, visit his website at www.dansaks.com. Dan also welcomes your feedback: e-mail him at . For more information about Dan

Endnotes:

1. Saks, Dan. “Why size_t matters,” Embedded Systems Design , July 2007, p. 11.

2. International Organization for Standardization, ISO/IEC 9899:1999: Programming languages — C, Geneva, Switzerland, 1999.

3. Spengler, Egon. “Don't cross the streams.” See also Stantz, Ray, “Total protonic reversal.”

Reader Response


This article was very nice. It is very useful. As I am new to the embedded software field, it gave an idea on how the functions are made source-level portable.

Thanks,

-Mohamed Thalib
GDA Technologies
Chennai, India


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.