Scope regions in C and C++ - Embedded.com

Scope regions in C and C++

Many programmers have difficulty distinguishing the concepts of scope, storage allocation, and linkage. Dan begins the process of sorting them out.

Like many other programming languages, C employs concepts such as scope and storage allocation. The C standard also employs the lesser known concept of linkage. Some other programming languages employ this concept as well, but few aside from C++ use the same terminology as C.

Although most programmers understand scope and storage allocation well enough to cope with common programming situations, their understanding often breaks down when confronted with anything out of the ordinary. They also seem to have a sense of what linkage is, but don't really understand how it's distinct from the other concepts.

Much of the confusion stems from the complex semantics of storage class specifiers such as extern and static . The keyword static is particularly inscrutable. Sometimes it affects the way a program allocates storage. It can also affect how the linker resolves names as it links object files together. In C++, it can even restrict the behavior of class member functions. Understanding these distinctions can help you implement your designs more effectively and avoid some maintenance headaches.

In this installment, I'll explain how the C standard defines the concept of scope. The C++ standard describes scope much as the C standard does, but with a few noteworthy differences. I'll focus initially on what the C standard says and point out the differences with C++ as appropriate. I'll try to be reasonably precise without swamping you with unnecessary details.

Translation units
As you well know, a C program can consist of numerous source files. A compiler processes one source file at a time. A source file usually contains #include directives that refer to headers. The compiler's preprocessor merges those headers with the source file to produce a transitory source file, which the standard calls a translation unit . Translation units are also known as compilation units.

Later phases of the compilation process transform each translation unit into an object file or object module. The linker combines object files and library components to produce an executable program.

As you'll see, you can't talk knowledgeably about scope and linkage without mentioning translation units.

Declarations and definitions
A declaration is a construct in the source code that introduces one or more names into a translation unit and associates attributes with those names. Alternatively, a declaration might simply redeclare a name introduced by a declaration that appeared earlier in the translation unit.

A declaration might also be a definition. Informally, a definition is declaration that not only says “here's a name”, but also “here's all the information the compiler needs to create the code for that name.”

For functions and objects, a definition is a declaration that generates storage. It's easy to tell when a function declaration is also a definition–a function definition has a body, which generates storage in the code space. For objects, the distinction is not so simple–it depends on the object's scope, linkage, and initializer. We'll get there in due time.

In C, a struct declaration never generates storage, so C doesn't distinguish struct definitions from other struct declarations. C++ does. In C++, a struct declaration is also a definition if it has a body, as in:

struct widget   // a definition    {    ...    };

It's just a declaration if it lacks a body, as in:

struct widget;  // just a declaration

The C standard uses more complicated verbiage to distinguish these different forms of struct declarations. I prefer C++'s approach.

In both C and C++, all typedef and enumeration constant declarations are also definitions.

Scope regions in C
When the compiler encounters the declaration of a name, it stores that name and its attributes in a symbol table. When the compiler encounters a reference to a name, it looks up the name in the symbol table to find those attributes. Each declared name is visible–can be found by lookup–only within a portion of the translation unit called its scope .

Some programming languages use dynamic scoping, in which name lookup is done at run time and may yield different results depending on the state of the running program. That is not the case with C and C++. Both languages use static scoping and do all name lookup at compile time.

C has four kinds of scope:

• A name has file scope if it's declared in the outermost scope of a translation unit, that is, outside of any function, structure, or union declaration. Its scope begins right after its declaration and runs to the end of the translation unit.

• A name (other than a statement label) has block scope if it's declared within a function definition (including that function's parameter list) or in a brace-enclosed block within that function. Its scope begins right after its declaration and runs to the end of the block immediately enclosing that declaration.

• A name has function prototype scope if it's declared in the function parameter list of a function declaration that is not also a definition. Its scope begins right after its declaration and runs to the end of the parameter list.

• Statement labels, and only statement labels, have function scope . A label can be defined only in the body of a function definition and is in scope everywhere in that body, even before the label has been defined.

For example, in Listing 1:

• Object k and functions f and g have file scope.

• Parameter i in function f and parameter n in function h have function prototype scope.

• Parameter i , objects j and k , and function h , all within function g , have block scope.

• Label done in function g has function scope.

Most programmers, not just C and C++ programmers, refer to names declared in an inner scope (a block scope) as local names, and to names declared at the outermost scope (file scope) as global names. The C++ standard uses the terms local and global in this sense, but the C standard rarely does.

The exact point that a name's scope begins depends on the way that name is declared.

The scope of such a name begins just after the end of its declarator and before its initializer, if present. A declarator is the part of an object or function declaration consisting of a name being declared, possibly surrounded by operators such as * , [] , and () . For example:

long int *p = NULL, x[N];

has two declarators, *p and x[N] . In this example, p 's scope begins at the = (equal sign), and x 's scope begins at the semicolon.

The scope of a structure, union, or enumeration tag begins just after the appearance of the tag in the type specifier that first declares the tag. For example, the name s appearing in:

struct s    {    ...    };

is a tag . The names of unions and enumerations are also tags. The scope of s in the above declaration begins at the opening brace immediately after s .

Similarly, the general form of an enumeration definition is:

enum tag { enumerator, enumerator, ...,     enumerator };

Each enumerator is an identifier that names a constant, optionally followed by an = and an expression that specifies the constant's value. The scope of an enumeration constant begins just after the appearance of its defining enumerator.

Consider:

enum color { red, green = 2, blue = 4 };

Here, red 's scope begins at the first comma, green 's begins at the second, and blue 's begins at the closing brace.

According to the C standard, a name in an inner scope can hide a name from an outer scope. For example, in Listing 1, the object k local to function g hides the global object k . The local k hides the global one in the sense that, when the compiler looks up k in the scope of the local k , it finds only the local k , never the global one. Thus, the assignment in g modifies the local k , not the global one.

The C++ standard explains the behavior of nested scopes differently, but the effect is pretty much the same.

Scope regions in C++
The scope regions of C++ are somewhat different from those in C. C++ identifies five kinds of scope: function, function prototype, local, namespace, and class. The first two–function scope and function prototype scope–are the same in C++ as they are in C. Local scope corresponds to C's block scope, namespace scope corresponds to C's file scope, and class scope is something new.

In C++, local scope extends the concept of block scope to account for some added features of C++, such as the ability to declare a variable in the initialization step of a for statement, as in:

for (int i = 0; i < N; ++i)    ...

C++ provides a facility called namespaces for grouping names that would otherwise be crowded into file scope. C++ generalizes the rules for names declared at file scope to include names declared in namespaces as well. In C++, a name has namespace scope if it's declared either in a namespace of the form:

namespace identifier    {    ...    }

or in what C calls file scope. Accordingly, the C++ standard shuns the term file scope in favor of global namespace scope , or just global scope .

C++ also introduces the concept of class scope for names declared within the brace-enclosed body of a class definition. (Classes in C++ include structures and unions, as well.) In C++, each class introduces a new scope, so the same name can be declared as a member in more than one class.

C doesn't quite have a corresponding notion of structure scope. Rather, the C standard says that each structure or union has a separate name space for its members. The C standard uses the term name space (two words) to mean something quite different from the namespace (one word) construct of C++. In C, a name space is a region of the compiler's symbol table. Despite the different verbiage in their respective standards, C and C++ look up structure and union members in much the same way.

Scope and linkage
The concept of scope is meaningful only within a single translation unit. Strictly speaking, a name declared in the global scope is simply a name declared at the outermost scope in a translation unit. That name isn't necessarily known in other translation units.

Despite what you may have learned, neither C nor C++ has any kind of scope that spans from one translation unit to another. Rather, a name declared in one translation unit can refer to a name defined in another translation though a property called external linkage . Stay tuned.

Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. For more information about Dan Saks, visit his website at www.dansaks.com. Dan also welcomes your feedback: e-mail him at . For more information about Dan .

Read more Dan Saks: Dan is now writing an online-only extension of his Programming Pointers column that will appear on Embedded.com every other month.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.