Demystifying constructors - Embedded.com

Demystifying constructors

Even the experienced C++ programmer can be confused about what exactly constructors do and when they get called.

One of the easiest ways to misuse a structure object in C is to fail to initialize it properly. In C++, a class can have special member functions, called constructors , that provide guaranteed initialization for objects of that class type. The guarantee isn't absolute—you can subvert it using a cast. Nonetheless, using constructors can reduce the incidence of uninitialized objects.

While most C++ programmers use constructors frequently, I keep running into C++ programmers, even experienced ones, who seem to misunderstand how constructors really work. They're surprised, and somewhat dismayed, when a seemingly simple statement generates a flurry of constructor calls that they didn't expect.

Initialization is rarely optional. When it doesn't get done, subsequent operations often fail. However, initialization can be a problem when it happens at unexpected times, especially when the affected code is time-critical.

This month, I'll start to take some of the mystery out of when constructors execute and what it is that they actually do. As I often do, I'll explain the behavior of C++ by showing equivalent code in C. If you're a C programmer who doesn't use C++, I think you'll still find these insights helpful. C code that mimics the discipline imposed by C++ is often better code.

Shallow parts vs. deep parts
I'll begin by introducing a little terminology that should simplify the remaining discussion.

Consider an abstract type that implements a ring buffer of characters. A ring buffer is a first-in-first-out data structure. Data can be inserted at the buffer's back end and removed from the front end. The C++ definition for a ring buffer class might look, in part, like:

class ring_buffer
{
~~~
private:
char *base;
size_t size;
size_t head, tail;
};

Member base represents an array that holds the buffered characters. Member size represents the number of elements in that array. Members head and tail are the indices of the elements at the buffer's front and back ends, respectively.

As I explained in a prior column, a class without base classes and virtual functions, and with all data members having the same accessibility (all public or all private), has essentially the same storage layout as a structure containing the same data members in the same order.1 Thus, for example, the ring_buffer class above has the same storage layout as a C structure defined as:

typedef struct ring_buffer ring_buffer;
struct ring_buffer
{
char *base;
size_t size;
size_t head, tail;
};

In truth, member base stores a pointer to the initial element of the array, not the array itself. The array is part of ring_buffer 's implementation, but it's not one of the data members. The array occupies storage allocated separately from the ring_buffer object.

The ring_buffer is an example of a class with deep structure . A class has deep structure if it has at least one data member that refers to separately-allocated resources managed by the class. Classes with members that are pointers to dynamically-allocated memory are the most common classes with deep structure. However, a class with a member of any type that designates a separately-allocated managed resource, such as an integer designating a file, also has deep structure.

Obviously, not all classes have deep structure. For example, a class representing complex numbers typically has just two data members of some floating-point type, as in:

class complex
{
~~~
private:
double real, imaginary;
};

Nothing in this class refers to resources beyond the data members. Such classes have shallow structure .

The shallow part of an object is the storage that contains the object's data members, as well as its base class sub-objects, vptr and padding, if any. (I briefly described base class sub-objects and vptrs in an earlier column.)1 The sizeof operator applied to a class object (or the class itself) yields the number of bytes in the shallow part of the object (or class).

The deep part of an object is any storage used to represent the object's state beyond the shallow part. Objects with shallow structure have no deep part.Where the shallow parts come from
When you define an object in either C or C++, as in:

ring_buffer rb;

the compiler generates code to allocate the object's shallow part. The initialization of the ring_buffer , including the allocation of its deep part, won't happen unless you write additional code. In C, you have to invoke the initialization code explicitly every time you define a ring_buffer . In C++, you can provide constructors for ring_buffer , which the compiler will use to initialize each ring_buffer automatically.

Before I explain where the shallow parts come from, I want to dispel a common misconception: With modern C++ compilers, a constructor doesn't allocate the shallow part of the object it constructs. Rather, the program allocates the shallow part using one of the usual run-time mechanisms for storage allocation. The constructor then initializes the shallow part, and in so doing, may allocate and initialize a deep part as well.

(In some early C++ implementations, the constructor did memory allocation for the shallow part, but only for new-expressions. C++ has evolved so that such implementations are now extinct and can be found only in museums.)

Now, back to the allocation of the shallow parts. By the “usual run-time mechanisms” for storage allocation I mean whatever the compiler normally does depending on whether the object to be allocated has automatic, static, or dynamic storage duration.2 These mechanisms are essentially the same in C++ as they are in C.

For an object with automatic storage duration (“automatic objects”), the shallow part will be allocated on the run-time stack. During optimization, the compiler may decide to place some automatic objects into CPU registers. It might even do this for a class object whose shallow part is small enough to fit into the available registers. However, it's easier to discuss automatic allocation if we don't belabor this detail and instead speak as if automatic objects are always placed in the stack.

If an automatic object is a function parameter, its storage will be allocated as the program evaluates function arguments prior to the call. If an automatic object is declared within a function body, its storage will be allocated upon entering the function.

For an object with static storage duration, the compiler, linker, and loader collaborate to place the object's shallow part in memory before the program starts running. From the running program's perspective, an object with static storage duration is always there. (In reality, the constructor doesn't run until run time. The new draft standard for C++ provides a new keyword constexpr , which will allow some constructors to “run” at compile time.)

In C++, objects with dynamic storage duration are those created by new-expressions. A new-expression allocates memory by calling a function named operator new or operator new [] . I've described the behavior of these functions in previous columns.3, 4 Constructors
In C++, a constructor is a special class member function that initializes objects of its class type. A constructor's function name is always the same as its class name. A class can have more than one constructor, each with a distinct parameter list, as in:

class ring_buffer
{
ring_buffer();
ring_buffer(size_t n);
~~~
};

A constructor can't specify a return type. You don't write calls to constructors, so there's no opportunity to use the return value. Again, you write object definitions, and the compiler automatically generates constructor calls for you.

A constructor that requires no arguments is called a default constructor . Since the ring_buffer class defined above has a default constructor, you can write definitions for ring_buffer objects as just:

ring_buffer rb;

When this definition appears at block or namespace scope, it automatically calls ring_buffer 's default constructor. When this definition appears elsewhere, such as at class scope, it might use a constructor other than the default. I'll explore this complication in a later column.

When a class has no default constructor, every definition for a ring_buffer object must specify arguments to one of those constructors, as in:

ring_buffer rb (32);

This definition automatically calls the constructor with a parameter of type size_t .

The compiler will reject any object definition whose argument list doesn't match any constructor's parameter list, as in:

ring_buffer rb ("xyzzy");

A constructor is like every other ordinary (nonstatic) member function in that it has an implicitly-declared parameter named this , which points to an object of the constructor's class. Whenever the program calls a constructor, the constructor's this parameter points to storage for an uninitialized object—the shallow part allocated by one of the usual run-time mechanisms. The constructor's job is to place appropriate initial values into the shallow part and, if there is a deep part, acquire and initialize it, too.

For example, the ring_buffer(size_t) constructor might be defined as:

ring_buffer::ring_buffer(size_t n)
{
base = new char [n];
size = n;
head = tail = 0;
}

The new-expression acquires the ring_buffer 's deep part. By default, it throws an exception if it fails. The rest of the constructor initializes the shallow part.

A C function that does essentially the same job looks like:

void rb_construct(ring_buffer *this, size_t n)
{
if ((this->base = (char *)malloc(n)) == NULL)
/* return or announce failure more overtly */
this->size = n;
this->head = this->tail = 0;
}

In C, you should probably call this function as soon as possible after the definition or statement that allocates the shallow part, as in:

ring_buffer rb;
rb_construct(&rb, 32);

Where constructors get called
Again, whenever your program defines an object with a class type, the compiler automatically plants a call to the object's constructor at the appropriate place in the generated code. If you learn to anticipate where those places are, you're less likely to be surprised by the code the compiler generates.

Among the places you're likely to see constructors called are:

  • Definitions for objects of class type, or for arrays with elements of class type.
  • New-expressions that create objects of class type, or arrays with elements of class type.
  • Return statements that return class objects by value.
  • Function calls with parameters of class type passed by value.
  • Explicit type conversions (cast expressions).
  • Any other expressions that create temporary objects of class type.
  • Throwing an exception of class type.
  • Catching an object of class type by value.

I'll look at some of these in detail in my next column.

Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. For more information about Dan Saks, visit his website at www.dansaks.com. Dan also welcomes your feedback: e-mail him at . For more information about Dan .

Endnotes:

  1. Saks, Dan. “Classes are structure, and then some,” Embedded.com, July, 2009.  www.eetimes.com/discussion/programming-pointers/4027034/Classes-are-structures-and-then-some
  2. Saks, Dan. “Storage class specifiers and storage duration,” Embedded Systems Design , January 2008, p. 9.  www.eetimes.com/discussion/programming-pointers/4026823/Storage-class-specifiers-and-storage-duration
  3. Saks, Dan. “Allocating objects vs. allocating storage,” Embedded Systems Design , September 2008, p. 11.  www.eetimes.com/discussion/programming-pointers/4026897/Allocating-objects-vs-allocating-storage
  4. Saks, Dan. “Allocating arrays,” Embedded Systems Design , January 2009, p. 9. www.eetimes.com/discussion/programming-pointers/4026953/Allocating-arrays

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.