Discriminated unions - Embedded.com

Discriminated unions

Discriminated unions can be useful in some applications, but they also provide insight into the advantages of using virtual functions.

Click here for more content from ESD March 2012.

Off and on for almost two years, I've been writing about techniques for representing and manipulating memory-mapped devices in C and C++. My more recent columns have been more about C++ than C, focusing on language features such as constructors and new-expressions, which C++ doesn't share with C.1, 2

Some readers have suggested that the C code I presented is preferable to the C++ code because the C structure implementations for devices are generally simpler than their corresponding C++ class implementations. Last month, I argued that the C++ implementations are actually better because they're easier to use correctly and harder to use incorrectly.

Other readers have claimed that my C++ implementations are flawed because they're too simple–they don't use inheritance and preclude the use of virtual functions.

Classes with virtual functions can be very useful, but they aren't the solution to every problem. Classes that represent memory-mapped devices, such as the ones I've presented, work with real hardware specifically because they don't use virtual functions. The new C++ Standard acknowledges the usefulness of such classes by defining categories such as standard layout classes , which avoid features such as virtual functions.

In the coming months, I'll explain what virtual functions are. I'll show why they can be useful in some applications, but undesirable in classes that represent memory-mapped device registers. I'll also show you how C++ typically implements virtual functions by showing how you can emulate them in C.

I'll begin this month by looking at the sort of problem that virtual functions are good at solving. I'll show a typical C solution using a construct called a discriminated union, and examine its limitations.

An illustrative problem
Suppose you have an application that employs two-dimensional geometric shapes, such as circles, rectangles, and triangles. At a minimum, each shape object contains some linear or angular distances sufficient to characterize the physical extent of the shape. For example, a circle has a radius, a rectangle has a height and a width, and a triangle has two sides and an angle. The shapes may have common attributes as well, such a position (planar coordinates), or outline and fill colors.

A fairly traditional C implementation for a shape is a structure with a nested union, such as:

typedef struct shape shape;struct shape {    coordinates position;    color outline, fill;    shape_kind kind;    union {        circle_part circle;        rectangle_part rectangle;        triangle_part triangle;    } u;};

The union member, u , is large enough to hold the largest of its members, but it can only store the value of one member at a time.

In this example, each union member has a different structure type. For a circle, you need to store only the radius, as in:

typedef struct circle_part circle_part;struct circle_part {    double radius;};

For a rectangle, you need the height and width:

typedef struct rectangle_part rectangle_part;struct rectangle_part {    double height, width;};

For a triangle, you need two sides and an adjacent angle:

typedef struct triangle_part triangle_part;struct triangle_part {    double side1, side2, angle;};

When the union members have such simple types, you might find it easier to dispense with the named structure types and simply define unnamed structures inside the union, as in:

    union {        struct {            double radius;        } circle;        struct {            double height, width;        } rectangle;        struct {            double side1, side2, angle;        } triangle;    } u;

The value of the shape's kind member indicates which union member is currently in use. The shape_kind type enumerates the possible values:

enum shape_kind {    sk_circle, sk_rectangle, sk_triangle};typedef enum shape_kind shape_kind;

For example, setting a shape's kind member to sk_rectangle indicates that it's now OK to access (write to or read from) the union's rectangle member and its height and width members.
A union paired with a discrete value that indicates the active member of the union is called a discriminated union or a tagged union . Some programming languages have a similar construct called a variant record . The discrete value is called a discriminator or tag . I prefer the terms discriminated union and discriminator because tag already has another meaning in C. The discriminator typically has an enumeration type, but could have an integral type.

You can write a small assortment of initialization functions to properly initialize each kind of shape. For example:

void rectangle_construct(shape *s, double h, double w) {    s->kind = sk_rectangle;    s->u.rectangle.height = h;    s->u.rectangle.width = w;}

initializes a shape as a rectangle with a particular height and width. You can write similar initialization functions for circle and triangles. Using these functions, you can create an array of assorted shapes as follows:

shape sa[4];~~~circle_construct(&sa[0], 2);triangle_construct(&sa[1], 5, 6, asin(0.8));rectangle_construct(&sa[2], 3, 4);circle_construct(&sa[3], 3);

Suppose your application needs to determine the shape with the largest area in a collection of shapes. You can do the job with the function named largest shown in Listing 1 .

Listing 1: A function that finds the largest shape in an array of shapes, using an overt switch statement.

shape const *largest(shape const s[], size_t n) {    shape const *p = NULL;    double max = -1;    size_t i;    for (i = 0; i < n; ++i) {        double area = -1;        switch (s[i].kind) {        case sk_circle:            area = PI * s[i].u.circle.radius                * s[i].u.circle.radius;            break;        case sk_rectangle:            area = s[i].u.rectangle.height                * s[i].u.rectangle.width;            break;        case sk_triangle:            area = sin(s[i].u.triangle.angle)                * s[i].u.triangle.side1                * s[i].u.triangle.side2 / 2;            break;        }        if (area > max) {            max = area;            p = &s[i];        }    }    return p;}

Most of the loop body in Listing 1 is a switch statement that computes the area of a shape. This is arguably a fundamental shape operation that would be better packaged as an operation all by itself:

double shape_area(shape const *s) {    switch (s->kind) {    case sk_circle:        return PI * s->u.circle.radius            * s->u.circle.radius;    case sk_rectangle:        return s->u.rectangle.height            * s->u.rectangle.width;    case sk_triangle:        return sin(s->u.triangle.angle)            * s->u.triangle.side1            * s->u.triangle.side2 / 2;    }    return -1;}

Using the shape_area function dramatically simplifies the largest function, as shown in Listing 2 .

Listing 2: A function that finds the largest shape in an array of shapes, using the shape_area function.

shape const *largest(shape const s[], size_t n) {    shape const *p = NULL;    double max = -1;    size_t i;    for (i = 0; i < n; ++i) {        double area = shape_area(&s[i]);        if (area > max) {            max = area;            p = &s[i];        }    }    return p;}

Correctness and maintenance problems
One of the problems with discriminated unions is that it requires a lot of discipline to ensure that you don't access a union member unless it's the currently active member as indicated by the discriminator. In most cases, this means you should be scrupulous about wrapping the accesses to union members inside if- or switch-statements that test the discriminator.

Unfortunately, nothing in C prevents accidents such as:

case sk_rectangle:    area = s->u.triangle.side1        * s->u.triangle.side2;    ~~~

This code will compile and then produce unpredictable, possibly erroneous, run-time results.

Some languages, such as Ada, have built-in support for discriminated unions intended to prevent such mishaps. Andrei Alexandescru showed how you can implement safe discriminated unions in C++ using template metaprogramming techniques.3, 4

Anonymous unions
C++ and more recent dialects of C support anonymous unions. That is, you can leave the union itself unnamed and access the union members as if they were members of the enclosing structure. If you define the shape structure as:

struct shape {    ~~~    union {        circle_part circle;        rectangle_part rectangle;        triangle_part triangle;    };  // union name omitted};

then you can simplify expressions such as:

s->u.rectangle.height = h;

to just:

s->rectangle.height = h;

My examples don’t use anonymous unions because I tested this code with older C compilers that don’t support them.

Another problem with discriminated unions is that they can easily lead to code that's hard to maintain. For example, in addition to the shape_area function, your application might include other shape operations such as shape_perimeter , shape_resize , or shape_put . Each function has a similar, if not identical, switch-statement structure as shape_area . If you add a new shape, or modify the attributes of an existing shape, you probably have to change every one of those functions.

Virtual functions provide an alternative to discriminated unions that are safe, efficient, and often more maintainable. I'll discuss virtual functions in my upcoming columns.

Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. For more information about Dan Saks, visit his website at www.dansaks.com. Dan also welcomes your feedback: e-mail him at . .

Endnotes:

  1. Saks, Dan. “Calling constructors with placement new,” Embedded Systems Design , September 2011, p. 9. www.eetimes.com/4219506.
  2. Saks, Dan. “Using member new to map devices,” Embedded.com, November 2011. www.eetimes.com/4230743.
  3. Alexandrescu, Andrei. “Discriminated Unions (I),” Dr. Dobbs Journal , April 1, 2002. http://drdobbs.com/184403821.
  4. Alexandrescu, Andrei.”Discriminated Unions (II),” Dr. Dobbs Journal , June 1, 2002. http://drdobbs.com/184403828.

This content is provided courtesy of Embedded.com and Embedded Systems Design magazine.
See more content from Embedded Systems Design and Embedded Systems Programming magazines in the magazine archive.
This material was first printed in March 2012 Embedded Systems Design magazine.
Sign up for subscriptions and newsletters.
Copyright © 2012
UBM–All rights reserved.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.