Sizing and aligning device registers -

Sizing and aligning device registers

Modeling memory-mapped device registers is easy in concept but can be tricky in practice.

In my last three columns, I discussed variations on a basic technique for accessing memory-mapped device registers using C and C++.1,2,3 My focus in those columns was on different ways to define pointers or references to the memory-mapped registers. However, several readers raised concerns about the data types that I used to declare the registers. This month, I'll address those concerns.

A brief recap
In my previous columns, I used an example from the ARM Evaluator-7T single-board computer. The board's documentation refers to the device registers as special registers , so I did, too. The Evaluator-7T's memory is byte-addressable, but each special register occupies a four-byte word. Special registers are also volatile, so I defined the type for special registers as:

typedef unsigned int volatile special_register;

The Evaluator-7T uses five special registers to control the two integrated timers, which I represented as a struct defined as:

typedef struct dual_timers dual_timers;struct dual_timers    {    special_register TMOD;    special_register TDATA0;    special_register TDATA1;    special_register TCNT0;    special_register TCNT1;    };  

The timer registers on the Evaluator-7T reside at address 0x03FF6000. A program can access the timer registers via a pointer defined as a macro, as in:

#define timers ((dual_timers *)0x03FF6000)

or as a constant object, as in:

dual_timers *const timers = (dual_timers *)0x03FF6000;

The TMOD register contains bits that you can set to enable a timer and clear to disable a timer. You can define the masks for those bits as enumeration constants:

enum { TE0 = 0x01, TE1 = 0x08 };

Then, for example, you can disable both timers using:

timers->TMOD &= ~(TE0 | TE1);

In C++, you can use a reference instead of a pointer, as in:

dual_timers &timers = *(dual_timers *)(0x03FF6000);

Since a reference is automatically dereferenced when you use it in an expression, you don't use the -> operator with a reference as you do with a pointer. Rather, you use the . (dot) operator, as in:

timers.TMOD &= ~(TE0 | TE1);

I like the way that references make memory-mapped registers look like objects.

Gustav Hållberg () pointed out that you can make timers act like a reference in C as well. The trick is to use a macro to define timers as a derefenced pointer, as in:

#define timers (*(dual_timers *)0x03FF6000)

Then expressions such as:

timers.TMOD &= ~(TE0 | TE1);

work just as well in C.

Size matters?
Philip Martel () expressed mild concern about my definition for the special_register type. He wrote that:

[The way you wrote the special_register typedef] implies that unsigned int is four bytes. I'm not familiar with the ARM architecture, but an unsigned int is two bytes on many other microprocessors. I feel you should have mentioned this. [The following definition:]

typedef unsigned long volatile special_register;

might have been better.

Although unsigned int might not be the best type to use here, unsigned long is no better. The Standards (for both C and C++) don't guarantee that either type occupies exactly four bytes.

Standard C and C++ support four signed integer types: signed char, short int, int, and long int . C99 (the revised C Standard published in 1999) added a fifth type: long long int . Standard C++ doesn't support long long int yet, but some C++ compilers already do. For each signed integer type, there's a corresponding unsigned integer type.

The Standard (for either language) specifies that the size of a char (signed or unsigned) is always one. It doesn't specify the size for any integer type, but it imposes the following restrictions:

  • each unsigned type has the same storage size as its corresponding signed type
  • each type in the sequence signed char, short int, int, long int, long long int , must occupy at least as much storage as the type preceding it in the list

For a given platform, all integer types may have the same size.

The Standard also guarantees that:

  • a char occupies at least 8 bits
  • a short int occupies at least 16 bits
  • a long int occupies at least 32 bits
  • a long long int occupies at least 64 bits

Any integer type may occupy more bits than the required minimum. A compiler for a 16-bit platform typically uses 16 bits for int and unsigned int and 32 bits for long int and unsigned long int . A compiler for a 32-bit platform typically uses 32 bits for all four types.

My feeling is that using a symbolic type such as special_register makes the exact type that you use unimportant, as long it has the right size and signedness. (Signedness is standardese for whether the type is signed or unsigned.)

Ashwin N () suggested yet another way to define the special_register type:

If you want to use an unsigned four-byte word, shouldn't you be doing:


/* … */
typedef uint32_t volatile special_register;

This should work with all modern standard C compilers/libraries.

The typedef uint32_t is an alias for some unsigned integer type that occupies exactly 32 bits. It's one of many possible exact-width unsigned integer types with names of the form uint N _t , where N is a decimal integer representing the number of bits the type occupies. Other common exact-width unsigned types are unit8_t and uint16_t . For each type uint N _t , there's a corresponding type int N _t for a signed integer that occupies exactly N bits and has two's complement representation.

I have been reluctant to use . It's available in C99, but not in earlier C dialects nor in Standard C++. However, it's becoming increasingly available in C++ compilers, and likely to make it into the C++ Standard someday. Moreover, as Michael Barr observed, if the header isn't available with your compiler, you can implement it yourself without much fuss.4 I plan to start using these types more in my work.

Again, using a typedef such as special_register makes the exact choice of the integer type much less important. However, I'm starting to think that uint32_t is the best type to use in defining the special_register type.

Alignment and padding
Several readers, including my friend and colleague Bill Gatliff (), expressed concern about alignment and padding of members within structures used to map memory. He wrote:

ARM registers are generally pretty forgiving, because they're almost always 32 bits. But other machines will have 8-, 16-, and 32-bit registers—sometimes within the same peripheral!

I love the syntax for structure pointers and references, but I have observed that in practice all the alignment and packing rules for structures really make for easily broken code.

How can I be sure that a structure declaration will result in a consistent mapping across compilers, compiler versions, and/or command line switches?

I chose an example from the ARM processor purposely because all the device registers are four-byte words aligned to a four-byte boundary with no padding whatsoever. I didn't want to get sidetracked by alignment and padding problems, which I managed to do until now.

First, just to be clear, I use the term alignment as it's defined in the C Standard: it's a “requirement that objects of a particular type be located on storage boundaries with addresses that are particular multiples of a byte address.” The Standard leaves it up to each platform to specify its alignment requirements. For example, a machine might insist that two-byte integers be aligned to a two-byte (even) boundary and that four-byte integers be aligned to a four-byte boundary. While some machines insist that eight-byte floating-point numbers be aligned to an eight-byte boundary, others might require that they be aligned only to a four-byte boundary. Character objects always have a size of one (by definition) and can reside at any boundary. Thus, they have no alignment requirement.

To meet its platform's alignment requirements, a compiler may insert unused bytes as padding inside a structure to ensure that the structure's members will be properly aligned. For example, suppose you define the device registers for a timer as:

struct timer    {    uint16_t MODE;    uint32_t DATA;    uint32_t COUNT;    };  

Here, MODE occupies two bytes. If you compiled this for a processor that aligns four-byte integers to a four-byte boundary, the compiler would insert two padding bytes between the MODE and DATA members, as if you had written:

struct timer    {    uint16_t MODE;    uint8_t padding[2]; // or uint16_t padding;    uint32_t DATA;    uint32_t COUNT;    };  

The padding forces the DATA member to an offset within the structure that's a multiple of four. Unfortunately, as Bill observes, different platforms align data differently, and therefore this structure may be padded differently when compiled on a different platform.

According to the C Standard, a program that attempts to access an improperly aligned object produces undefined behavior. This means that the program is in error, but the exact consequences of that error are platform dependent. With many processors, an instruction that attempts to access improperly aligned data issues a trap. Other processors may execute the instruction anyway, but use up more cycles to fetch the data than they would if the data were properly aligned.

But, you might ask, if the compiler automatically aligns objects for you, how could data possibly be misaligned? Well, it could happen if you use a cast to convert a char * into an int * , as in:

char buffer[100]; *p = (int *)buffer[1];if (*p == 0)    ...  

Here *p refers to an object of type int residing at an odd byte boundary. Casts can lead to such misalignments. It's one of many reasons to avoid using casts.

Some programmers avoid padding and alignment problems by abandoning structures and using pointer arithmetic instead. For example, you can declare a timer as just:

uint8_t volatile *timer0 = (uint8_t *)0x03FF6000;

Then *timer0 or timer0[0] refers to the MODE register and *(timer + 4) or timer[4] refers to the DATA register. Ick.5

If you have a conscience, you'll at least define symbolic offsets for the registers, as in:

enum { TMODE = 0, TDATA = 4, TCOUNT = 8 };

so that you can write timer0[TMODE] and timer0[TDATA] . This looks better until you realize that each expression refers to only the first byte of a multi-byte object. You must use a cast to access the MODE register as a 16-bit value, as in:

uint16_t mode;...mode = *(uint16_t *)(timer0 + TMODE);  

Moreover, you must use a different cast to access the DATA register as a 32-bit value. Yuck!6

Using pointers and offsets to represent device registers is less readable than using structures. Any programming style that relies heavily on casting is bound to be error prone. This cure is worse than the original disease.

If you are concerned that the compiler might not lay out the structure exactly as you'd like, you can just insert the padding yourself. To ensure that the compiler doesn't insert extra padding between the members, many compilers offer compiler switches or pragma directives that let you control the padding. For example, using GNU C/C++, you can write:

#pragma pack(1)struct timer    {    ...    };#pragma pack()  

The first pragma sets the maximum alignment for structure members to one. It effectively turns off the alignment requirements. The second pragma restores the alignment requirement to what it was at the start of the compilation.

The GNU compilers offer two other forms for the pack pragma. Using #pragma pack(push, n ) saves the current alignment requirement on a stack, and then sets the alignment requirement to a multiple of n . Using #pragma pack(pop) restores the most recently saved alignment requirement.

Unfortunately, neither the C nor C++ Standard sanctions these pragmas. Most compilers provide some form of these pragmas, but the details vary from compiler to compiler. In fact, any code that uses any pragma is nonstandard and likely to be less portable than you'd like it to be.

Although you can never be absolutely sure how your compiler will pad the members within a structure, the Standard guarantees there will be no padding before the first member. The Standard also mandates that each member in a structure must be allocated in the order in which it's declared.

Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. You can find him on the web at or write to him at .

End notes

  1. Saks, Dan. “Mapping Memory,” Embedded Systems Programming , September 2004, p. 49.
  2. Saks, Dan. “Mapping Memory Efficiently,” Embedded Systems Programming , November 2004, p. 47.
  3. Saks, Dan. “More Ways to Map Memory,” Embedded Systems Programming , January 2005, p. 7.
  4. Barr, Michael. “Introduction to Fixed-Width Integers,”, January 2004.
  5. A mild form of disgust.
  6. A stronger form of disgust.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.