Incomplete types as abstractions

November 12, 2003

Dan_Saks-November 12, 2003

Incomplete types eliminate some of the pitfalls associated with data abstraction in C, but they can also create other problems.

Last time, I explained why abstract types are useful and how you can approximate them in C by using structs and associated functions. I also showed why C structs don't do a good job of enforcing abstractions.

You may remember that I had promised to show you how classes in C++ overcome the deficiencies of using C structs as abstract types. I intend to do this soon, just not yet; Eric S. Johnson, one of my readers, brought to my attention that I neglected to consider an alternative approach using incomplete types. I'll remedy that oversight first.

Abstract types revisited
An abstract type is one that's packaged to separate its functional behavior from its implementation. For an example of an abstract type, let's use a ring buffer containing characters. A ring buffer is a first-in-first-out data structure. You insert data at the buffer's back end and remove it from the front.

Last month, I implemented a ring buffer in C as a struct named ring_buffer, along with various functions that performed fundamental ring_buffer operations. These functions included:

  • rb_empty(&rb), which returns true if and only if ring_buffer rb has no elements in it
  • rb_front(&rb), which returns the character at rb's front end
  • rb_pop_front(&rb), which removes the character from rb's front end

I defined the ring_buffer struct and functions in a header called ring_buffer.h, which is reprinted in Listing 1. For brevity, I didn't provide a complete set of functions. At the very least, the ring_buffer should also provide:

  • rb_push_back(&b), which appends a character to rb's back end.
Listing 1: The declarations for a ring buffer in C

// ring_buffer.h

#ifndef RING_BUFFER_INCLUDED
#define RING_BUFFER_INCLUDED

enum { rb_size = 32 };

struct ring_buffer
    {
    char array[rb_size];
    int head, tail;
    };
typedef struct ring_buffer ring_buffer;

inline
bool rb_empty(ring_buffer const *b)
    {
    return b->head == b->tail;
    }

inline
char rb_front(ring_buffer const *b)
    {
    return b->array[b->head];
    }

inline
void rb_pop_front(ring_buffer *b)
    {
    if (++b->head >= rb_size)
        b->head = 0;
    }

#endif

My implementation of the ring_buffer struct used three data fields:

  • array is a fixed-size array of characters
  • head is the index of the front element in the buffer
  • tail is the index of the array element just beyond the back element, or equivalently, the index in the array at which the next element will be inserted
The C implementation also used a separate symbolic constant:
  • rb_size is array's dimension
The problem with using structs as abstract types is that C compilers don't help catch mistakes that undermine the abstraction. For example, ring_buffer.h in Listing 1 doesn't provide a function to initialize a ring_buffer. Before you insert elements into a ring_buffer, you should initialize the head and tail to the same value, typically zero.

You can get away without initializing the head and tail if you declare the ring_buffer as a statically allocated object because C programs initialize statically allocated objects to zero by default. That is, when the declaration:

ring_buffer rb;

appears at file scope (as a global), the compiler generates data definitions or executable code to ensure that rb.head and rb.tail will be initialized to zero.

On the other hand, if rb's declaration appears at block scope (in a function body), this initialization doesn't happen. You must write the initialization explicitly, as in:

rb.head = rb.tail = 0;

Unfortunately, this assignment undermines the abstraction. Again, an abstract type should provide a sufficiently complete set of basic operations so that you can use that type without knowing its implementation. Ideally, the compiler shouldn't allow the user of an abstract ring_buffer type to write statements, such as this assignment, which depend on knowing how the type is implemented.

The appropriate way to correct this oversight in C is to declare a function in ring_buffer.h that will initialize a ring_buffer appropriately, as follows:

inline
void rb_start(ring_buffer *b)
    {
    b->head = b->tail = 0;
}

Admittedly, this is easy to do, and once it's done, you can initialize a ring_buffer rb by simply calling:

rb_start(&rb);

This preserves the abstractness of the ring_buffer type.

I'm not saying that you can't implement an abstract type using a struct. You can. Rather, my point is that if you make a mistake that violates the abstraction—which is easy to do—your C compiler won't issue any errors to tell you that you did.

Incomplete types
C offers another way to implement ring_buffer as an abstract type. This technique makes it much harder to violate the abstraction by putting the structure's data members completely out of reach. The trick is to declare ring_buffer in the ring_buffer.h header as just:

struct ring_buffer;

This is an incomplete type declaration. It declares ring_buffer as a structure tag without defining the structure's members.

As long as ring_buffer is an incomplete type, the compiler doesn't know how much storage to allocate for each ring_buffer object. Thus, it won't let you declare any ring_buffer objects. However, you can declare objects of type "pointer to ring_buffer" because the size of a pointer is independent of what it points to.

An incomplete type declaration such as:

struct ring_buffer;

declares ring_buffer as a tag, not a type. Thus, a subsequent declaration for a pointer p written as:

ring_buffer *p;

is an error, because ring_buffer is not a type. It will compile only after a typedef that defines ring_buffer as a type, such as:

typedef struct ring_buffer ring_buffer;

You need not write this typedef after the incomplete type declaration because it actually incorporates the incomplete type declaration. If ring_buffer has not yet been declared, then this declaration both declares ring_buffer as a tag designating an incomplete type and as a type name designating the same incomplete type. You can complete an incomplete type by specifying its members in a definition such as:

struct ring_buffer
    {
    char array[rb_size];
    int head, tail;
    };

Thereafter you can declare ring_buffer objects, as well as pointers to ring_buffers.

Listing 2: The declarations for a ring buffer in C, using an incomplete type

// ring_buffer.h

#ifndef RING_BUFFER_INCLUDED
#define RING_BUFFER_INCLUDED

typedef struct ring_buffer ring_buffer;

bool rb_empty(ring_buffer const *b);
char rb_front(ring_buffer const *b);
void rb_pop_front(ring_buffer *b);
void rb_start(ring_buffer *b);

#endif

Incomplete types as abstractions
Listing 2 shows an alternate version of the ring_buffer.h header, which declares ring_buffer as an incomplete type. Using this header, you can't declare a ring_buffer object, but you can declare a pointer to a ring_buffer, such as:

ring_buffer *tx; // UART transmit buffer

Before you do anything else with tx, you must initialize it to point to an actual ring_buffer object. You can't point it to a named ring_buffer object, as in:

tx = &rb;

because, once again, you can't declare rb as a ring_buffer as long as the type is incomplete. Initializing tx by dynamic allocation is also a problem. A statement such as:

tx = malloc(sizeof(ring_buffer));

won't compile because you can't apply the sizeof operator to an incomplete type.

Once again, an initialization function solves this problem. The ring_buffer.h header in Listing 2 declares a function named rb_start for initializing a pointer to a ring_buffer. Calling rb_start(&tx) dynamically allocates a ring_buffer, initializes it, and returns its address in tx.

Listing 3: The complete type and function definitions for the incomplete ring buffer type declared in Listing 2

// ring_buffer.c

#include "ring_buffer'h"

#include <stdlib.h>

enum { rb_size = 32 };

struct ring_buffer
    {
    char array[rb_size];
    int head, tail;
    };

bool rb_empty(ring_buffer const *b)
    {
    return b->head == b->tail;
    }

char rb_front(ring_buffer const *b)
    {
    return b->array[b->head];
    }

void rb_pop_front(ring_buffer *b)
    {
    if (++b->head >= rb_size)
        b->head = 0;
    }

void rb_start(ring_buffer **b)
    {
    *b = (ring_buffer *)malloc(sizeof(ring_buffer));
    b->head = b->tail = 0;
    }

#endif

The definition for rb_start appears in a separate source file, ring_buffer.c, along with the other ring_buffer function definitions, as shown in Listing 3. The complete ring_buffer type definition appears in that source file, so that the ring_buffer functions can access members array, head, and tail and obtain the size of a ring_buffer.

Other modules use the ring_buffer type by simply including the header. Since the header doesn't define the complete type, the compiler will reject any user code that attempts to access a ring_buffer's data members directly. For example:

ring_buffer *tx;
rb_start(&tx);    // ok
...
tx->head = 0;    // error

This is good; it's exactly the sort of error that a compiler should issue to preserve the integrity of the abstraction.

Still room for improvement
Using incomplete types as abstract types is an improvement over using structs in one important respect: with incomplete types, a C compiler can do a much better job of detecting violations of the abstractions. Unfortunately, using incomplete types still lets some subtle violations go undetected.

For example, calling rb_start(&tx) allocates storage for a ring_buffer by calling malloc. User code could call free(&tx) to deallocate that storage, but in doing so it compromises the abstraction and could lead to future maintenance bugs. The fact that rb_start(&tx) calls malloc is an implementation detail subject to change, which should be hidden behind the abstraction. User code that calls free(&tx) yanks that detail out into the open, thus weakening the abstraction. Unfortunately, the compiler won't complain about this call.

Of course, the ring_buffer type should provide an additional function, rb_stop. Calling rb_stop(&tx) releases any resources acquired by calling rb_start(&tx), without revealing how it works.

A possible loss of performance
The other, possibly more serious, problem with using incomplete types is that it may hurt run-time performance by precluding the use of inline functions. To see why this is so, let's consider calling rb_empty(&tx), which returns true if tx has no elements in it.

The rb_empty function is very short:

bool rb_empty(ring_buffer const *b)
    {
    return b->head == b->tail;
    }

The function body is so short that, on many architectures, a program will execute more instructions just to call and return from the function than to do the real work inside the function body. You can ask the compiler to eliminate the call/return overhead by declaring the function with the keyword inline, as in Listing 1.

When it expands a function call inline, the compiler generates code for a copy of the function at the point of the call. For example, the compiler translates:

if (rb_empty(&tx))
    ...

into:

if ((&tx)->head == (&tx)->tail)
    ...

The compiler's optimizer can simplify this further to:

if (tx.head == tx.tail)
    ...

which is as efficient as it gets.

In short, using abstract types encourages you to wrap operations inside functions. Those additional function calls can degrade performance compared to what you'd get without using abstraction. Using inline functions eliminates much of the performance degradation.

The inline keyword is available in C++ and C99. When using older C dialects, you can get essentially the same effect by using a macro, such as:

#define rb_empty(b) ((b)->head ==(b)->tail)

Be aware that the inline keyword is merely a request, which compilers may ignore. Most compilers will inline small functions, but may balk at inlining large functions, especially those employing loops or recursion.

The compiler can expand a function call inline only if the function definition is visible at the call point. Therefore, the ring_buffer inline function definitions must be in ring_buffer.h so that they'll be visible at the call points. This works just fine when ring_buffer is a complete type in the header, as in Listing 1.

On the other hand, when ring_buffer is incomplete in the header, the header can't define any ring_buffer functions. If it did, the function definitions wouldn't compile because they'd refer to undefined structure members. The best you can do is declare the functions without inline in the header (as in Listing 2) and define them in a separate source file (as in Listing 3). In short, using incomplete types precludes using inline functions.

C++ classes let you hide implementation details, yet still use inline functions. Stay tuned.

Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. You can write to him at dsaks@wittenberg.edu.

Reader Response

Dan, Just wanted to say thanks for the great series of articles at Embedded.com. I've been developing embedded firmware for roughly 20 years. Starting out as a hardware design engineer, my firmware beginnings were strictly in assembly language and I kept to that for about the first 10 years. I started to venture into C and now use it almost exclusively.

Between reading your articles and some others, notably Jack Ganssle's, I've been adding more elements such as abstraction, encapsulation, and others into my code. Along the way, I always thought my code was pretty good - hey, it worked, right?! - but I've seen in the last couple years just how much better, i.e. more robust and maintainable, it needs to be.

With your recent articles, I may even venture into using some elements of C++ in my embedded code. Thanks for your well-written, clear descriptions not only of how to use features, but also why to use them.

Regards,

Erik Loose
GTRAN Inc.


Dear Dan,

Your column of December 2003 "incomplete types as abstractions" was a real eye-opener for me. You gave an elegant solution for a problem for which I knew only a more clumsy solution (involving type casts and macros).

Thank you for your very informative column.

Kind regards,
Rob Uiterlinden

Loading comments...