Advertisement

Evaluating Function Arguments

May 01, 2002

Dan_Saks-May 01, 2002

Evaluating Function Arguments

The C and C++ standards do not specify the order of evaluation for function arguments. This can lead to subtle portability problems.

As I explained in my column last month ("As Precise As Possible"), the C Standard specifies well-defined, portable behavior for many, but not all, language constructs. In some cases, the Standard describes the behavior of a construct as implementation-defined or unspecified. A program with such behavior is valid, but it may yield different results when compiled and executed for different target platforms. (The C++ Standard uses these terms in essentially the same way as the C Standard; in the following discussion, any mention of "the Standard" applies to C++ and C.)

The difference between implementation-defined behavior and unspecified behavior is simply that each compiler must document its implementation-defined behaviors, but not its unspecified behaviors. In other words, implementation-defined behavior is the compiler's license to translate a valid construct as it sees fit (usually within limits imposed by the Standard), as long as the specifics are documented. Unspecified behavior liberates the compiler from the necessity of documentation.

The upside of this freedom is that it allows each compiler to translate certain constructs into code that's tailored to the target platform. The downside is that it can create portability problems: code that yields expected results on one platform may produce surprisingly different results when compiled and executed on other platforms.

This month, I'll examine one specific example of unspecified behavior.

Left to right?

Function f below simply writes the values of its parameters to the standard output stream:


#include <stdio.h>
void f(int i, int j)
{
printf("i = %d; j = %d\n", i, j);
}

Now, suppose you call the function using:



int n;
...
n = 0;
f(n++, n);


The result is pretty obvious, isn't it? The call evaluates the arguments from left to right, and performs the post-increment on the first argument, n++, before evaluating the second argument. Thus, the first argument evaluates to zero, the second argument evaluates to one, and the function call produces the following output:

i = 0; j = 1

I tried this with four different compilers on my Pentium-based PC:

  • Borland C++ 5.5 for Windows
  • Metrowerks CodeWarrior Pro 6.0 for Windows
  • Microsoft Visual C++ 6.0 for Windows
  • GNU gcc 3.0.3 running under Cygwin

Not one compiles code that produces the "obvious" result. Rather, they all compile code that yields:

i = 0; j = 0

I also compiled the code using two different compilers for the ARM7 processor:

  • ARM Development System trial version 1.1 (a version of Metrowerks CodeWarrior)
  • GNU gcc 3.0.3

Both of these compilers compile code that produces the "obvious" output:

i = 0; j = 1

Are the Pentium compilers wrong? No, they're simply exercising the freedom granted by the Standard. According to the Standard, the order of evaluation for the arguments in a function call is unspecified. Thus, a compiler is free to interpret:

n = 0;
f(n++, n);

in a few different ways:

1. It can evaluate the arguments from left to right, and perform the post-increment on the left argument before evaluating the right argument. This yields the "obvious" result:

i = 0; j = 1

2. It can evaluate the arguments from left to right, and delay the post-increment on the left argument until after evaluating the right argument. In that case, both arguments will be zero:

i = 0; j = 0

3. It can evaluate the arguments from right to left. As in the second, the post-increment doesn't get done until after both arguments are evaluated. Both arguments will be zero:

i = 0; j = 0

There's a subtle distinction between evaluating an argument and completely executing an argument expression. For example, n++ is an expression that both yields a value and has a side effect. In this case, the side effect is to increment the value of n. The value of the expression is the value of n before it's incremented.

A function call has evaluated the argument once it has copied n into the parameter storage area for the call (on the stack or in a register). The entire argument expression, including the side effect, need not be completely executed at that point. The function call can complete the side effect before it evaluates the next argument, or it can delay completing the side effect until just before jumping to the function.

Or right to left?

Both compilers for the ARM7 apparently evaluate function call arguments from left to right, but we can't tell yet whether the Pentium compilers evaluate function arguments from left to right or from right to left. Let's see what happens when we compile a different function call:

n = 0;
f(n, n++);

In this case, the Pentium compilers from Borland, Metrowerks, and GNU compile code that produces:

i = 1; j = 0

Apparently, these compilers evaluate the arguments from right to left and perform the post-increment on the right argument before evaluating the left argument. In contrast, Visual C++ generates code that produces:

i = 0; j = 0

which is still inconclusive. Obviously, the compiler delays the post-increment until after evaluating both arguments, but it's still not clear whether the compiler evaluates the arguments from left to right or from right to left. Let's try another variation:

n = 0;
f(n, ++n);

which uses pre-increment, rather than post-increment. In this case, all four Pentium compilers produce code whose output is:

i = 1; j = 1

On the surface, this suggests that all the compilers evaluate the arguments from right to left. The call evaluates the right argument first, which increments n, and then it uses the incremented value for the left argument, too.

However, there is another possibility. The compiler could do the pre-increment of the second operand before it does anything else. Then it could evaluate the arguments from left to right. So you still can't tell from the output exactly how the compiler evaluates the function call arguments.

Let's take one more shot at this:

n = 0;
f(++n, n);

Here, the pre-increment is on the left argument. All four Pentium compilers, including Visual C++, generate code for a call that produces:

i = 1; j = 0

The only way this can happen is if the call evaluates the argument from right to left and performs the pre-increment just before evaluating the left argument.

Always one way or the other?

Since the order of evaluation for function arguments is unspecified, a compiler can evaluate the arguments to a call in any order, as long as it evaluates all the arguments before jumping to the function. In theory, a compiler could use even a random number generator to determine the order of evaluation for each call. In practice, most compilers seem to use either left-to-right or right-to-left for all calls. The four Pentium compilers I tested evaluate function arguments from right to left.

I was surprised to find that the two compilers for the ARM7 varied the order of evaluation from call to call. As I mentioned earlier, for:

n = 0;
f(n++, n);

both compilers generated code that produced:

i = 0; j = 1

which can happen only if the call evaluates the arguments from left to right, incrementing n in between. But for:

n = 0;
f(n, n++);

both compilers generated code that produced:

i = 1; j = 0

which can happen only if the call evaluates the arguments from right to left, incrementing n in between.

I don't believe this curious change in the argument evaluation order is a consequence of the ARM7 instruction set. I looked at the code that each compiler generated and it's remarkably different considering the similarity of the result.

Staying away

All other things being equal, portable code is better than non-portable code. Function calls that depend on the order of argument evaluation are non-portable, and you should avoid them.

How do you know if you've written a call with an evaluation order dependency? Static analyzers such as lint can find them for you. (This month's "Beginner's Corner," on p. 55, features a general description of lint.) When I ran each of the function calls above through Gimpel Software's PC-Lint, it produced the following message:

Warning 564: variable 'n' depends on order of evaluation

Once you've found such a call, rewriting it to avoid argument evaluation order dependencies is usually easy. For example, you can rewrite:

f(n++, n);

as:

f(n, n);
++n;

if that's what you mean, or as:

f(n, n+1);
++n;

if that's what you mean. The hard part may be deciding which you mean.

A more serious warning

Of all the compilers I tested, only the ARM Development System (ADS) trial version v. 1.1 produced anything comparable to lint as a warning that I was treading on non-portable behavior:

f(n++, n);

provoked the following warning:

Warning: undefined behavior: 'n' written and read without intervening sequence point

It's interesting that, although the ADS compiler is based on Metrowerks CodeWarrior, CodeWarrior Pro 6.0 for Windows produced no such warning.

This warning says the call has undefined behavior, not just unspecified behavior. A program exhibiting undefined behavior is not just a non-portable program-it's an erroneous program. This brings us to a subtle point: although the order of evaluation for function arguments is unspecified, a program that pushes too far into unspecified territory can produce undefined behavior.

Next time, I'll explain what a sequence point is and how it elevates this particular unspecified behavior into an undefined behavior.

Dan Saks is the president of Saks & Associates, a C/C++ training and consulting company. You can write to him at dsaks@wittenberg.edu.

Return to the May Table of Contents

Loading comments...