Advertisement

Sequence Points

July 01, 2002

Dan_Saks-July 01, 2002

Sequence Points
Overcomplex statements can confuse your compiler. Knowing where the sequence points are can help make your intentions clear.

Both the C and C++ Standards are reasonably precise in specifying the syntax and semantics of programs. However, neither standard pins down the exact meaning for every single construct. The standards describe the behavior of certain constructs as undefined, unspecified, or implementation-defined.

In my last column ("Evaluating Function Arguments," May 2002, p. 25), I presented a specific example of unspecified behavior common to C and C++: the order of evaluation for function arguments.

For example, suppose you define a function:



void f(int i, int j)
   {
   printf("i = %d; j = %d\n", i, j);
   }

and call it using:



int n;
...
n = 0;
f(n++, n);

Since the order of evaluation for function arguments is unspecified, a compiler can interpret the code in any of these different ways:

1. It can evaluate the left argument first and increment the left argument before evaluating the right argument. In this case, the function's output will be:

i = 0; j = 1

2. It can evaluate the left argument first and delay incrementing the left argument until after evaluating the right argument. In that case, both arguments will be zero:

i = 0; j = 0

3. It can evaluate the right argument first. In this case, as in the second, it won't increment the left argument until after evaluating both arguments. Again, both arguments will be zero:

i = 0; j = 0

One inquisitive reader, Chris Noonan, took my example a bit further. He added a third parameter, k, to function f and found that, with at least one compiler, the call:

n = 0;
f(n++, n++, n);

produced:

i = 1; j = 0; k = 2

as output. Here, the call evaluates the argument neither from left to right nor from right to left, but from the inside out. Such is the nature of unspecified behavior.

Anyway, I concluded last time by observing that one, and only one, compiler, the ARM Development System (ADS) trial version 1.1, diagnosed the call:

f(n++, n);

with the message:

Warning: undefined behavior: 'n' written and read without intervening sequence point.

This warning says that the call has undefined behavior, not just unspecified behavior. A program that exhibits undefined behavior is not just a non-portable program-it's an erroneous program. In order to understand why this call produces undefined behavior, you first have to know what a sequence point is. Before that, you have to know what a side effect is.

Side effects

All expressions, except those of type void, yield a value. Any expression, even a void one, can have side effects. For example, n++ is an expression whose value is n. It also has the side effect of incrementing n.

In general, a side effect is an action that changes the state of the program's execution environment. Side effects include modifying an object, modifying a file, accessing a volatile object, or calling a function that does any of these operations. Let's look at a few more examples of side effects.

You can use an increment or decrement expression as a full expression. A full expression is one that is not a sub-expression of some larger expression. For example, this increment expression:

++n;

is a full expression. So is the increment expression in:

for (i = 0; i < n;="">

In these cases, we're interested in the increment expression only for its side effect. When you use an increment expression as a sub-expression of a larger expression, as in:

x = *p++;

then you're interested in the resulting value as well as the side effect.

No unary operator other than ++ and -- has a side effect. For example, neither -x nor !b has a side effect.

An assignment expression such as:

m = n;

has an obvious side effect, namely, it stores the value of n (converted to the type of m) into m. Less obvious is that the assignment also returns a value, namely, the value of m after the assignment.

Arithmetic assignment operators, such as += and -= produce both values and side effects, but other binary operators such as + and - produce only a value. They have no side effects.

Finally, any expression that calls a standard library input or output function has the side effect of modifying a file. For example:

printf("i = %d; j = %d\n", i, j);

yields an int result, which we often ignore. Its side effect-modifying the standard output file-is usually what we're after.

Sequence points

An expression may have more than one side effect. For example, the expression:

has five operators. Three of those operators have side effects:

  • The leftmost ++ operator increments p.
  • The rightmost ++ operator increments q.
  • The = operator modifies the value of *p.

The Standard does not specify the order in which these side effects must take place. For example, a compiler might generate code for the expression equivalent to either:

*p = *q;
++p;
++q;

or:

*p = *q;
++q;
++p;

or even:

tp = p;
++p;
tq = q;
++q;
*tp = *tq;

The only assurance the Standard offers about the order of these side effects is that they all will be complete by the time the evaluation of the full expression is complete.

Suppose the next statement also has side effects:

*p++ = *q++;
a[--i] = j;

Here, the second statement has two side effects:

  • The -- operator decrements i.
  • The = operator modifies a[i].

Again, the Standard does not specify the execution order of these side effects, but it does guarantee that both side effects will be complete at the end of the full expression. The Standard also guarantees that neither of those side effects will happen until the previous expression, including all of its side effects, is completely evaluated.

The end of a full expression is one example of a sequence point. A sequence point is any point in a program's execution wherein all side effects of previous evaluations are complete and no side effects of subsequent evaluations have started. If you want, you can think of the semicolon at the end of each statement as a sequence point:

Sequence points also appear at the end of the controlling expression of an if or switch statement, the controlling expression of a while or do statement, and each of the expressions of a for statement. For example, you can envision there's a sequence point at the right parenthesis after the conditional expression in an if statement:

There is also a sequence point at the end of the first (leftmost) operand of && (logical and), || (logical or), ?: (conditional), and , (comma) operators. For example:

C and C++ support short-circuit evaluation. That is, if the left sub-expression, *p++ != 0, is false, then the entire condition is false, regardless of the value of the rest of the expression. In that case, the program will not evaluate any part of the right sub-expression *q++ != 0, including its side effect. However, it will have completed the side effect of the left sub-expression.

Aside from the operators listed above, most binary operators-notably the assignment operator-do not have a sequence point after their first operand. As I described earlier, the expression:

*p++ = *q++;

does not necessarily increment p before incrementing q.

There is a sequence point between the operands of a comma operator. For example, there's a sequence point at the commas in:

There's also a sequence point at each semicolon and at the closing right parenthesis.

Although the comma operator is a sequence point, the commas that separate the arguments in a function call are not comma operators; they're just punctuation. Once again, the order of evaluation of the function arguments is unspecified. However, there's a sequence point after the evaluation of the arguments but before the call actually occurs. For a call such as:

f(i++, j++);

the Standard doesn't tell you whether the program will increment i before or after incrementing j, but it does tell you that both i and j will be incremented before the program arrives at f.

Doing too much at once

Most experienced C and C++ programmers know that expressions such as:

i = 2 * i++;

are suspect at best. In this case, if the post-increment occurs after the assignment, then the result will be as if you had written:

i = 2 * i + 1;

But, if the increment occurs before the assignment, then the result will be as if it were just:

i = 2 * i;

In turns out that the behavior of an expression such as:

i = 2 * i++;

isn't just unspecified; it's undefined. According to the Standard:

"Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored."

A program that does otherwise has undefined behavior.

A call such as:

f(n++, n);

has undefined behavior because it violates the rule just cited. There's no sequence point between the argument expressions. The first (left) argument modifies n. It also reads the value of n, but only to determine the new value to be stored in n. So far, so good. However, the second (right) argument expression reads the value of n between the same pair of sequence points as the first argument, but not to determine the value to be stored in n. This additional attempt to read the value of n has undefined behavior.

This rule also explains why expressions such as:

a[i++] = i;

have undefined behavior as well.

A correction
Earlier, I recalled my observation that only one compiler, the ARM Development System (ADS) trial version 1.1, issued a warning that:

f(n++, n);

produces undefined behavior. In my last column, I also observed: "It's interesting that, although the ADS compiler is based on Metrowerks CodeWarrior, CodeWarrior Pro 6.0 for Windows produced no such warning."

Ian Johnson, product manager with ARM Ltd. wrote to set me straight. He wrote, in part:

"ARM licenses the Metrowerks CodeWarrior IDE for use in our ARM Developer Suite (ADS), but we do not use the Metrowerks compilers. Incidentally, neither do we use the Metrowerks debugger. We just enable the 'project management' and editor facilities of the CodeWarrior IDE.

"When you invoke the compiler via armcc -vsn you will see that you get a banner identifying the compiler as coming from ARM Ltd.

"The C and C++ compilers that ship in ADS are developed entirely at ARM by a dedicated compiler team. Hence, it is not surprising that you get different results when compiling with the ARM compilers and the Metrowerks ones, since they are entirely different compilers.

"Also, I can explain why our compilers apparently vary the order of evaluation of function arguments depending on the call. We don't take a simple left-to-right or right-to-left approach. We evaluate the arguments that are 'hardest' first.

"For example, if you call f(g(), i+1), then we will evaluate g() first. If we evaluated i+1 first, we would need to save the register holding that result before evaluating g() because the call to g() may trash the contents of that register. I believe it was for this purpose that the C standard left the order of evaluation up to the compiler writer."

Thanks for the insights, Ian.

Dan Saks is the president of Saks & Associates, a C/C++ training and consulting company. He served for many years as secretary of the C++ standards committee. With Thomas Plum, he wrote C++ Programming Guidelines. You can write to him at dsaks@wittenberg.edu.

Return to the July 2002 Table of Contents

Loading comments...