Use volatile judiciously

According to the syntax rules of C and C++, the keyword volatile can appear anywhere that the keyword const can. Whereas a const object is one whose value the program can't change, a volatile object is one whose value might change spontaneously. That is, a volatile object might change state even though no statements in the program actually change it. Typical examples of volatile objects are memory-mapped device registers and shared objects in a multithreaded program.

Using the volatile qualifier helps you write correct programs by preventing the compiler from optimizing code so aggressively that it no longer does what you expect. On the other hand, using volatile too generously can increase code size and degrade performance.

I'll begin with an example to show how the volatile qualifier reins in optimization.

Preventing overly aggressive optimization
The ARM Evaluator-7T single-board computer uses memory-mapped device registers, called special registers , to control and communicate with peripheral devices. The computer's memory is byte-addressable, but each special register is a four-byte word aligned to an address that's a multiple of four.

You could manipulate each special register as if it were an unsigned int . Some programmers prefer to use a type that specifies the physical size of the register overtly, such as uint32_t . (Types such as uint32_t are defined in the C99 header &#60stdint.h&#62 .) When feasible, I prefer to use a symbolic type whose name conveys the meaning of the type rather than its physical extent, such as:

typedef uint32_t special_register;  

One of the Evaluator-7T's devices is a pushbutton. Pressing the pushbutton on the board sets bit 8 of a register called IOPDATA to the value one. Releasing the button sets the bit back to zero. You can define a mask for that bit as an enumeration constant:

enum { button = 0x100 };  

The IOPDATA register resides at location 0x3FF5008. You can declare IOPDATA so that it looks like an object by using a macro:

#define IOPDATA (*(special_register *)0x03FF5008)  

or in C++, by using a reference:

special_register &IOPDATA    = *(special_register *)0x03FF5008;  

With either definition, executing the following loop causes the program to wait until the button is pressed:

while ((IOPDATA & button) == 0)    ;  

This code will work just fine if the compiler does no optimizations. But, if it does optimizations–as many modern compilers do–it should deduce that nothing happens in this loop to change the value of IOPDATA . Apparently, the condition is either always true or it's always false, and therefore the compiler concludes that the program needs to test it only once.

The compiler can optimize the original code into:

if ((IOPDATA & button) == 0)    for (;;)        ;  

Now, if the condition is true (once and for always), then the program falls into the inner loop and never escapes. If the condition is false, the program goes on with its life. The optimized code is more efficient in that it executes each iteration of the loop in less time than the original code. Unfortunately, the program won't respond properly to the button.

The way to prevent this overly aggressive optimization is to use the volatile qualifier in declaring IOPDATA , as in:

#define IOPDATA     (*(special_register volatile *)0x03FF5008)  

or in:

special_register volatile &IOPDATA    = *(special_register *)0x03FF5008;  

Both declare IOPDATA so that it designates a volatile special_register . The volatile qualifier indicates that the special register may change even though the program didn't do anything explicit to change it. Therefore, the compiler can't “optimize away” references to IOPDATA ; it must generate code that accesses IOPDATA every time the original source program says it should.

As with const , you can place volatile either before or after the type it modifies. That is:

volatile special_register &IOPDATA  


special_register volatile &IOPDATA  

are equivalent. As I explained a while back (“const T vs. T const,” February 1999, p. 13), I prefer the latter style.

In the days before the volatile qualifier existed, programmers solved the overly aggressive optimization problem by placing device-driver code in a separate source file. You had to compile that one file with optimizations turned off, but then you could compile the rest of the program with optimizations turned on.

Some compilers offer pragmas that will turn off compiler optimizations for a portion of a source file. You can wrap the driver code inside a pair of pragmas, as in:

#pragma optimization = off/* driver code goes here */#pragma optimization = on  

but you have to hope you turned optimizations off in just the right places. Using volatile eliminates the guesswork. It turns off optimizations only for the volatile-qualified objects and for nothing else.

A surprisingly brief delay
Although they aren't a precise way to keep time, delay loops still come in handy now and then. For example, the following function apparently wastes time for a specified number of ticks:

void delay(int ticks)   {    int t;    for (t = 0; t < ticks; ++t)        ;    }  

In fact, a compiler might optimize this function into nothing.

Local variable t is the counter for a loop that does nothing but increment t until it's equal to ticks . Thus, the optimizer can replace the loop with a single assignment that just sets t to its final value:

void delay(int ticks)    {    int t;    t = ticks;    }  

Since the function never does anything with that final value, the compiler might even eliminate the assignment and thus generate no code at all for the function body. When that happens, the delay function doesn't cause much of a delay after all.

I'm not making this up. I've seen compilers do this optimization.

Of course, not every compiler optimizes so aggressively, but just in case yours does, you should declare the loop counter volatile , as in:

void delay(int ticks)    {    int volatile t;    for (t = 0; t < ticks; ++t)        ;    }  

Inadvertent inefficiency
Sometimes, when you're not looking, using volatile can make your code a little less efficient than it should be. Consider this example.

The Evaluator-7T has a single seven-segment display. (Picture a single digit on the display of a VCR, such as the one in Figure 1.)

Figure 1: Seven-segment display

There's one bit in the IOPDATA register to control each of the seven segments. Those bits are numbered 10 through 16, inclusive. For example, bit 10 of IOPDATA controls the top segment of the display. Storing a one in that bit lights the top segment. You can define a mask that covers all seven bits as:

enum { display = 0x1FC00 };  

Suppose b contains a bit pattern for a character that you wish to display on the seven-segment display. To keep things simple, let's assume that the bits in the pattern are already in bits 10 through 16 of b (no shifting necessary), and that all other bits in b are zero. You could put the new pattern on the display by simply storing b into IOPDATA , as in:

IOPDATA = b;  

However, IOPDATA has other bits that control other devices. This assignment clears those other bits, possibly changing the state of those devices.

You can preserve the values in the other bits by using |= instead of = , as in:

IOPDATA |= b;  

(Some devices are so finicky that even overwriting bits in the register with their current values causes the device to misbehave, but that's not a problem with this hardware.) Unfortunately, using |= doesn't clear the already-lit segments in the display, so the display may wind up showing more segments than just those specified by the bits in b . After just a few more |= operations, all seven segments will probably be lit.

Here's a function that displays bit pattern b to the seven-segment display:

void display_put(uint32_t b)    {    IOPDATA &= ~display;    IOPDATA |= b;    }  

The &= expression clears the display by clearing all seven bits. The |= expression puts the new value on to the display.

Clearing the display and then storing the new value in distinct steps makes it easy to see that this function does what it's supposed to do. Unfortunately, it also makes the function a little less efficient than it has to be. In this case, the volatility of IOPDATA prevents the compiler from performing a possible optimization.

The statement:

IOPDATA &= ~display;  

is actually a shorthand for:

IOPDATA = IOPDATA & ~display;  

That is, it reads IOPDATA , performs a bitwise-and, and writes the result back into IOPDATA . Similarly:

IOPDATA |= b;  

reads IOPDATA , performs a bitwise-or, and writes the result back.

If IOPDATA weren't volatile, the compiler could keep the result of the bitwise-and in a CPU register, apply the bitwise-or to that register, and then write only the final result to IOPDATA . This would save one write to and one read from IOPDATA . However, IOPDATA is a volatile object, so the compiler can't do this optimization. The compiled program must access volatile data exactly as specified in the source program.

If you want to enable the optimization, then you must write the function so that it reads and writes IOPDATA exactly once each. Here's an overt way to do it:

void display_put(uint32_t b)    {    register uint32_t temp = IOPDATA;    temp &= ~display;    temp |= b;    IOPDATA = temp;    }  

Here's a more concise alternative, which usually generates the same code:

void display_put(uint32_t b)    {    IOPDATA = (IOPDATA & ~display) | b;    }  

Here, the compiler will use an unnamed temporary object, probably in a CPU register, to hold the intermediate results. The resulting code reads and writes IOPDATA exactly once.

This example should not discourage you from using volatile . Just be aware that, while you should use const generously, you should use volatile much more judiciously.

Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. Visit his website at He also welcomes your feedback: e-mail him at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.