Modular Programming in C

John R. Hayes

November 30, 2001

John R. HayesNovember 30, 2001

Modular Programming in C

Separating interface from implementation has many practical benefits. Here's a simple way to do just that, in ANSI-standard C code.

How do you organize medium-sized or larger C programs? Few C textbooks give any insight; they concentrate on exposition of C's features using small examples. The examples usually fit in a single source code file. Without some guiding principle of organization, larger C programs can become difficult to understand and impossible to maintain. Modular programming is one way of managing the complexity.

Modular programming groups related sets of functions together into a module. The module is divided into an interface and an implementation. The module exports the interface; clients modules import the interface so that they can access the functions in the module. The implementation of the module is private and hidden from the view of clients. The division of programs into modules is a powerful organizing principle for designing non-trivial programs. Modules provide abstraction, encapsulation, and information-hiding, making the large-scale structure of a program easier to understand. Careful design of modules also promotes software reuse, even in embedded systems programming.

Unfortunately, C does not explicitly support modular programming. The archetypal modular programming language is Nicklaus Wirth's Modula-2 (and -3). A modular language, such as Modula, has syntax for separating the implementation from the interface and for importing modules. Conveniently, some of C's features, both in the language and the pre-processor, can be co-opted into providing Modula-like capabilities. These features, in conjunction with a set of conventions, make modular programming in C a practical and effective technique.

Modular programming

Programmers work with abstractions every day. An abstraction highlights the essential features of something while ignoring its details. An abstraction could encompass a piece of hardware, a software component, and so on. For example, a serial port can be abstracted to a pair of functions for reading and writing bytes to the port; the details of the UART, its registers and addresses, are suppressed. For our purposes, the interface to a module is an abstraction of the module's functions. The interface defines the features of the module and how to use them.

The programmer using a module sees the module's interface definition. On the other side of this interface is the module's implementation. The programmer should not use anything in the implementation that is not defined in the interface. Ideally, the module should ensure that the programmer cannot access the internals of the implementation. Such information-hiding is the only way that an implementation can protect the integrity of its data and guarantee the module's correctness. Hiding the implementation also allows an improved implementation to be substituted without affecting anything outside the module.

The two most important elements of a module are the division of the module into an interface and an implementation and the ability to hide information in the implementation. A syntax for creating a module (similar to that of Modula) would be:

DEFINITION MODULE foo
    EXPORT list of functions and data
    declarations of exported functions
        and data
END foo
IMPLEMENTATION MODULE foo
    IMPORT list of modules used
    ... code ...
END foo

The interface definition and the implementation are separate. The interface definition has a list of functions and data structures to export, as well as the declarations of those functions and data. These are the only parts of a module that clients can use. The implementation lists its dependencies on other modules. It then fleshes out the exported functions; these can use hidden helper functions and data.

How can we apply these techniques with C? By default, every function defined in C is globally accessible. The closest thing C has to a module is a source (.c) file. We can encapsulate functions and data in a source file to form the implementation part of a module. A corresponding header (.h) file forms the interface to the module. Let's see how this works in detail.

Implementations

Listing 1 is the implementation of an example module named foo. First the module lists its dependencies on other modules; it imports the interfaces to the modules that it uses. This is done with the C preprocessor #include directive. The module foo uses the modules x and y. The implementation of foo also includes its own interface definition. This allows the compiler to verify that the module's implementation matches its advertised interface. We will examine the contents of the interface files later.

Listing 1 foo.c (implementation)

/* foo.c */
/* Import needed interfaces: */
#include "x.h"
#include "y.h"
/* Implements this interface: */
#include "foo.h"

int var1;
static int var2;

void Fun1(int *p) { ... }
static void Fun2(void) { ... }

To hide internal details, the implementation of a module must be able to create private data and functions. By default, C makes every function defined in a file callable (visible) from functions defined in any other file. Similarly, all data items other than those local to a function are accessible from other files. However, we can use C's static keyword to limit the scope of functions and data to a single file.

The static keyword seems peculiarly named and gave me pause when I was first learning C. static is a storage class specifier. By default, local variables in a function are dynamically allocated, that is, on the stack. Adding static to a local variable's definition causes it to be statically allocated such that it persistently stores values between calls of the function. This makes perfect sense. Where things get confusing is in the definition of non-local data. Here, static affects the scope of the data; the data is statically allocated whether static is present or not. Non-local data that is declared static is limited in scope to the file where it is defined (technically, from the point of declaration to the end of the file). static can also be applied to functions to limit their scope to a single file.

We will use C's static keyword to hide information within a module's implementation, that is, within its source file. Returning to Listing 1, var1 is accessible from outside the module, while var2 is entirely internal to foo's implementation. Similarly, function Fun1 can be called from outside the module, but Fun2 can only be called from other functions within the foo module. Using static to hide information requires some self-discipline. Forgetting to put static on data that should be internal to a module allows the data to "leak" out of the module.

Listing 2 foo.h (interface)

/* foo.h */
#define var1 Foo_var1
#define Fun1 FooFun1

extern int var1;
extern void Fun1(int *);

Interfaces

Listing 2 shows the interface definition of our example module. Ignore the #defines at the top; we'll return to those shortly. The bulk of the interface declares the external, publicly accessible parts of the module. A client module can access the integer variable var1. Or, it can call function Fun1 with an argument that is a pointer to an integer.

Consider the simple C program:

#include <stdio.h>
main() { printf("e = %f\n", exp(1)); }

When run on my Sun workstation this program produces the result e = 0.000000, instead of e = 2.718282.

The program compiles without errors or warnings. So, what's wrong? The program has a simple mistake. Without the missing #include <math.h>, the C compiler assumed that the exp function expected an integer argument and returned an integer result. The addition of function prototypes to C made it possible to avoid mistakes like this. Function prototypes make it possible for the compiler to check the types of arguments passed to functions, even when the function is defined in another file. However, C does not require you to declare or use prototypes. I maliciously left out the #include <math.h>; to illustrate a point: if it is possible to deliberately omit the prototypes, it is possible to do so accidentally.

Assume that a module, shown in Listing 3, wants to use the services provided by our foo module. Unfortunately, the programmer forgot to import foo's interface definition, that is, there is no #include "foo.h". Further suppose that the programmer forgets that Fun1 takes a pointer to an integer, but instead passes in an integer. Presumably, this error will not be detected, as was the case with exp. However, with our module technique, the error is detected. Although the programmer's erroneous module will compile, the linker will fail to locate Foo1 and an error message will result.

Listing 3 client.c (implementation)

/* client.c */
/* Import needed interfaces: */
#include "z.h"
/* Implements this interface: */
#include "client.h"

static void ClientFun(void)
{
    int z;
    ...
    Fun1(z);
    ...
}

How is this possible? What is different between this and the exp example? Look at the #defines in foo's interface definition that we skipped earlier. The #define for Fun1 will cause any use of Fun1 to be replaced with FooFun1. Note that the #define is strategically placed before the prototype for Fun1. Consequently, the prototype is for a function named FooFun1. Since the module interface gets included in the module's implementation source file, the actual function definition will be FooFun1. When foo.c is compiled, the function will be named FooFun1 in the object file. If a client module fails to import (#define) the interface definition for foo, its use of Fun1 will compile to Fun1. There is no way for the linker to connect Fun1 to FooFun1, so linking will fail. Once the programmer corrects the client module to import foo's interface, its use of Fun1 will become FooFun1. Now, the compiler can detect the type mismatch.

Note that the substitution of FooFun1 for Fun1 is token-based rather than text-based. In other words, the substitution does not occur like a search-and-replace command in a text editor. Therefore, there is no danger that OtherFun1 will become OtherFooFun1. This is because the C preprocessor understands C, at least at the lexical token level. Consequently, only actual uses of the function Fun1 will be replaced with FooFun1. This works even if the function uses no arguments, a fixed number of arguments, or a variable number of arguments. It also works when creating a pointer to a function.

I define new names for all of the functions available in a module's interface. The new name consists of the function's name prefixed with the name of the module. For our purposes here, it would be sufficient to prefix some fixed string instead, M_ for example, regardless of the module's name. However, adding the module's name to all global functions increases the likelihood that the name will remain unique if the module is placed in a library. It is less important to define module-specific names for public data, since renaming the function is usually sufficient to detect failure to include the interface. However, adding the module's name to all public data, as for function names, reduces the chance of a name clash if the module is added to a library.

So far, our example interfaces have just had functions and data. What about data types? If a public function uses or returns aggregated data, such as a structure, we will usually need to declare that data structure in the interface as well. Declaring a data structure in the interface exposes its internal components to clients. This is frequently reasonable and necessary; for example, when a client wants to pass a large amount of data to a function, it can be clearer to put the data into a structure and pass a pointer to the structure, rather than pass all the data as individual arguments. In this case, the client has to know the names of the structure's components so that it can fill in their values.

But what if we want to hide the representation of a data structure from clients? Modula calls this an opaque export: the interface presents clients with the name of the data structure and functions to create and operate on instances of the data structure, but does not reveal anything about the data structure's components or layout. It turns out that we can do this in C too. Listing 4 shows the interface to a module that provides a simple priority queue. A priority queue client can create a queue, enqueue data, and dequeue data in priority order. The interface declares a Priority_queue data type that consists of a pointer to an actual priority queue structure. The three function prototypes use the Priority_queue type. However, nowhere in the interface is the actual structure of the priority queue declared. This is called an incomplete type in C. Somewhere, the type must be completed; this occurs in the implementation part of the priority queue. This achieves our goal of hiding the representation of a data structure from its clients.

Listing 4 priqueue.h (interface)

/* priqueue.h */
#define Enqueue      PriEnqueue
#define Dequeue      PriDequeue
#define CreateQueue    PriCreateQueue

typedef struct priority_queue_struct * Priority_queue;

extern void Enqueue(Priority_queue, int priority, void *data);
extern void *Dequeue(Priority_queue);
extern Priority_queue CreateQueue(void);

The example interface definitions in this article have few comments. Real interface definitions should contain all the documentation that a programmer needs to use the module. However, this documentation should not reveal any clues about how the module is actually implemented. Keeping the documentation with the prototype declarations gives the programmer one-stop shopping to find out how to use the module. One can easily envision a tool that, given the name of a module, finds the corresponding interface definitions, extracts just the declarations and appropriate comments, and presents them to the programmer in a pretty format. This would be the analogue of the javadoc tool that is used by Java programmers.

Discussion

To recap, modular programming consists of separating implementation from interface and hiding information in the implementation. In C this is achieved by placing the interface definition in a header file and the implementation in a source file. Disciplined use of static is used to hide implementation details. The interface definition forms the link between a module and its clients. The module includes its own interface definition to confirm that it implements the advertised interface; a client module imports/includes the interface definition to verify that it is using the interface correctly.

The use of header files to export information from a C source file may seem strange at first. When I first learned C, I put constant definitions and data structure declarations in the header for inclusion by its corresponding C source file. Now, if a constant or data structure is only used by one source file, I place them in that source file. There is still a place for the other style of header file. If a constant or data type is used throughout an application, they belong in a traditional header file.

I have used this module style for a while now. However, using the C preprocessor to rename functions to force clients to import the interface definitions is a recent addition. I actually retrofitted this into an existing application and found that I had indeed forgotten to import interface definitions in a few places. The only negative aspect of this technique that I have found so far is that debuggers use the modified function names. This can lead to some confusion, for example, when a stack trace-back shows FooFun1 instead of the expected Fun1.

Modular programming compels us to think more deeply about how we divide a program into its constituent pieces. Designing clear and simple interfaces helps clarify the overall structure of a program. Modular programming also encourages us to think about code reuse, both by formalizing a module's user interface and by explicitly listing its dependencies on other modules. Reducing the number of dependencies a module has increases its reuse potential. This reduction can often be achieved by a simple, refactoring of the functionality of an application's modules. Eventually, an application becomes a collection of reusable pieces with a single custom main module that ties them all together.

I have included some references to additional reading on modular programming. Parnas' paper is the classic reference on the subject. Hanson's book gives numerous examples of interfaces and implementations in a rare blend of elegance and practicality. Finally, I recommend Wirth's book on Modula-2. Although I have never had the opportunity to program in Modula-2, I learned some valuable programming techniques from this book. I always find it worthwhile to explore alternative programming methods embodied in other languages.

John Hayes works at Johns Hopkins University's Applied Physics Laboratory. He has been developing embedded systems for spacecraft for close to two decades. He earned his BSEE at Virginia Polytechnic Institute and State University, and his MSCS at Johns Hopkins University. His e-mail address is john.hayes@jhuapl.edu.

References:

1. Parnas, D. L. "On the Criteria to be Used in Decomposing Systems into Modules," Communications of the ACM, 5(12), December 1972, pp. 1053-58.

2. Hanson, D.R. C Interfaces and Implementations-Techniques for Creating Reusable Software. Reading, MA: Addison-Wesley, 1997.

3. Wirth, N. Programming in Modula-2. Berlin: Springer-Verlag, 1982.

Return to December 2001 Table of Contents

Loading comments...