Advertisement

Accessing memory-mapped classes directly

September 17, 2010

Dan_Saks-September 17, 2010


Device drivers typically communicate with hardware devices through device registers. Many processors use memory-mapped I/O, which maps device registers to fixed addresses in the conventional memory space. A typical device employs a small collection of registers with closely-spaced memory addresses.

In my May column, I presented some common alternatives for representing and manipulating memory-mapped devices in C. I recommended using a structure to represent each device's collection of registers as a distinct type.1 In my June column, I explained why C++ classes are even better than C structures for representing memory-mapped devices.2

Readers posted numerous comments on Embedded.com about both columns. A few of those comments alleged that using a pointer to access a class object representing a memory-mapped device incurs a performance penalty by somehow adding unnecessary indirect addressing.

Interestingly, no one complained that using C structures to represent memory-mapped devices incurs a similar performance penalty. This left me wondering if the allegation is that using a C++ class is more expensive than using a C structure? Or is it that using pointers to access memory-mapped class objects is more costly than using some other means? My impression is that the readers were more concerned about the latter--the alleged cost of using pointers. However, evaluating the cost of using pointers involves comparing classes with structures, so I might as well consider both questions.

If using pointers or references is slow, then what might be faster? Last month, I described the available alternatives for placing objects into memory-mapped locations.3 This month, I'll consider alternative implementations that eliminate the need to use pointers to access memory-mapped devices.

Classes vs. structures
As in my earlier columns on memory-mapped devices, I'll use as my example device a programmable timer that employs three device registers--TMOD, TDATA, and TCNT--in adjacent locations staring at 0xFFFF6000.

Each device register is a four-byte word aligned to an address that's a multiple of four, so you can manipulate each device register as a uint32_t. Device registers are volatile objects, so I recommend declaring each register with a type defined as.

typedef uint32_t volatile device_register;

The timer_registers C++ class encapsulates the entire collection of timer registers as a single abstract type. The class definition appears in Listing 1.


Click on image to enlarge.
As I showed in my June column, you can define a pointer whose value is the memory-mapped address of the actual device registers, initialized using a reinterpret_cast, as:

timer_registers *const the_timer
    = reinterpret_cast<timer_registers 
        *>(0xFFFF6000);

You can then use the pointer to designate the timer object in member function calls such as:

the_timer->disable();
the_timer->set(timer_registers::TICKS_PER_SEC);
the_timer->enable();

In the class definition, the access specifiers (public and private), the enumeration constants (TICKS_PER_SEC and TE), the type name (count_type), and the member functions (disable, enable, set, and get) don't occupy any storage in a timer_registers class object. Only the data members (TMOD, TDATA, and TCNT) do. Thus, for a given target platform, a timer_registers class object in C++ has the same layout as a timer_registers structure defined in C as:

typedef struct timer_registers timer_registers;
struct timer_registers
    {
    device_register TMOD;
    device_register TDATA;
    device_register TCNT;
    };

All of the member functions in the timer_registers class are "ordinary"--neither static nor virtual. If we ignore access control and assume everything is public, every ordinary C++ class member function is conceptually equivalent to a C (nonmember) function with an additional parameter. That additional parameter is a pointer to the object (in this case, the device) upon which the member function acts.

For example, the timer_registers class member function:

void timer_registers::set(count_type c)
    {
    TDATA = c;
    TCNT = 0;
    }

translates into code that's nearly the same as, if not identical to, the code generated for a C function defined as:

void timer_registers_set
    (timer_registers *this, count_type c)
    {
    this->TDATA = c;
    this->TCNT = 0;
    }

With a given compiler and target, the instructions generated for each function might appear in a slightly different order or use different CPU registers, but both functions should produce the same results and execute in roughly the same time. You can verify claims like this experimentally--something I'll do in an upcoming column.

In C++, each nonstatic class member function has an implicitly-declared pointer parameter whose actual name is this. Thus, you can write the body of a member function using this exactly the same as in the body of its equivalent nonmember function.

A member function call such as:

the_timer->set(timer_registers::TICKS_PER_SEC);

should translate into code that's nearly the same as, if not identical to, the code generated for the nonmember function call:

timer_registers_set(the_timer, TICKS_PER_SEC);

Again, for a given target, the instruction ordering or register usage for each call might be slightly different, but the computed results should be identical and the timing darn close.

Unnecessary indirection?
Using either a C++ class or a C structure to represent a memory-mapped device scales up easily on platforms with multiple instances of the device. For example, if the hardware supports two timers, then you can simply define one pointer to the base address of each device, as in:

timer_registers *const timer0
    = reinterpret_cast<timer_registers 
        *>(0xFFFF6000);
timer_registers *const timer1
    = reinterpret_cast<timer_registers 
        *>(0xFFFF6800);

Then a call such as:

timer0->disable();

disables one timer, while:

timer1->disable();

disables the other. Both expressions call the same function, but pass the address of a different device. Aside from the declaration of additional pointers, using even more timers adds nothing to the cost of using a timer. (Of course, you can't add more timers by just declaring more pointers. They have to be in the hardware.)

On the other hand, if your embedded system has only one timer, does this design (which allows for more than one timer) cost more than if it were written for only one timer? A few readers alleged that it does, and that an implementation using extern declarations would be faster. No one gave any specifics of what such an implementation might look like, but it's easy to conjure them up.

As I explained last month, both C and C++ will let you declare a memory-mapped object using a standard extern declaration such as:

extern timer_registers the_timer;

and then use linker command options or linker scripts to force the_timer into the desired address. If you use either a C++ class, or a C structure and functions that accept a pointer to that structure, then using extern declarations will support more than one timer just as well as using pointers does.

For example, if your hardware has two timers, you can declare:

extern timer_registers timer0;
extern timer_registers timer1;

and then disable timer1 by calling:

timer_registers_disable(&timer1);

in C, or by calling:

timer1.disable();

in C++. The code generated for these calls should be pretty close to what you get when you use pointers instead.

The possible advantage of using extern declarations instead of pointers might be that, when you have only one instance of a particular device, you can write the functions to access the registers directly within the lone object and eliminate the pointer indirection. Let's see how this might work.

Eliminating indirection in C
Recall the pointer-based C implementation of the timer set function:

void timer_registers_set
    (timer_registers *this, count_type c)
    {
    this->TDATA = c;
    this->TCNT = 0;
    }

If you assume there's only one timer, you can remove the first parameter, this, from the function's parameter list and replace every reference to this-> with a reference to that one timer, as in:

extern timer_registers the_timer;

void timer_registers_set(count_type c)
    {
    the_timer.TDATA = c;
    the_timer.TCNT = 0;
    }

Because the_timer has a fixed address and structure member TDATA has a fixed offset, the compiler and linker can resolve the_timer.TDATA into an address known at build time. In contrast, the value of parameter this could be unknown at build time, so accessing this->TDATA requires a pointer-plus-offset computation at run time. This reasoning applies to accessing member TCNT as well.

Eliminating indirection in C++
The design I just described works in C++ as well as in C. However, you can implement it more elegantly in C++ by using a class with static members. Specifically, you can rewrite the timer_registers class so that every data member and every member function is declared static, as shown in Listing 2.

Click on image to enlarge.


The static data members in a class exist independently of any objects of that class. As with other statically allocated objects, static data members have fixed addresses rather than offsets from the start of their class. Thus, static data members contribute nothing to the size of their class. All the data members in Listing 2 are static, so timer_registers objects will be empty. Interestingly, the C++ standard requires that the sizeof operator return something greater than zero, so sizeof(timer_registers) is at least one.

The declaration of a non-const static data member appearing within a class definition is not a definition--it doesn't allocate storage. In the case of the timer_registers class, this is good because we don't want the compiler deciding where to place static members TMOD, TDATA, and TCNT. These are memory-mapped registers that must be associated with specific memory addresses.

Static data members in C++ have external linkage. The linker sees their names. You should be able to use linker command options or scripting to bind static data members to specific memory addresses, much as you can with other objects declared extern. Alternatively, if your C++ compiler supports an extension that lets you place objects at a specific address, you should be able to apply it to static data members, as in:

class timer_registers
    {
    ~~~
    static device_register TMOD @ 0xFFFF6000;
    static device_register TDATA @ 0xFFFF6004;
    static device_register TCNT @ 0xFFFF6008;
    ~~~
    };

Specifying an address for each device register, whether by a language extension or by the linker, is tedious and error-prone. Using an intermediate structure, as in:

class timer_registers
    {
    ~~~
private:
    struct T
        {
        device_register MOD;
        device_register DATA;
        device_register CNT;
        };
    static T @ 0xFFFF6000;
    ~~~
    };

lets you specify just one address for the entire collection of registers. With some compilers, it may also facilitate further optimization, which I'll explain in an upcoming column.

Recall the implicitly-declared this parameter in a nonstatic class member function points to an object of the class type. However, the static data members for a class aren't in any one object--they're in their own statically-allocated storage. A member function that accesses nothing but static data members doesn't need a this parameter, and passing a pointer as the value for this is wasted effort. Static member functions eliminate that waste. A static member function is simply a class member function that doesn't have an implicitly-declared this parameter.

The definition for a static member function looks just like the definition for a nonstatic member function. For example, when the timer_registers set function is a static member, the definition looks like:

void timer_registers::set(count_type c)
    {
    TDATA = c;
    TCNT = 0;
    }

which is exactly the same as when set is a nonstatic member. You don't supply the keyword static in the definition because the language doesn't allow it there. However, the class declares set as static, so the function really has only one parameter, c. No this.

Since a static member function doesn't have a this parameter, a call to that function need not use the . (dot) or -> (arrow) operators to provide a value for this. However, if you just use the member name as the function name, as in:

set(timer_registers::TICKS_PER_SEC);

the compiler won't know that set is from the timer_registers class, unless the call takes place within the scope of the timer_registers class. Rather, you use the :: operator to provide the full name of the static member function, as in:

timer_registers::set
   (timer_registers::TICKS_PER_SEC);

On the other hand, if you want to declare a timer_registers object and use it as the object in a static member function call, you can. For example, when set is a static member, you can still write calls such as:

the_timer.set(timer_registers::TICKS_PER_SEC);

In this case, the compiler looks at the expression to the left of the dot and finds the_timer, sees that the_timer is a timer_registers object, looks in the timers_registers class to find set, sees that set is a static member function, and then generates a call to set that ignores the_timer. The call generates the same code as when the function name appears as timer_registers::set. You can also call a static member function using a pointer and an ->, and the compiler will examine and ultimately discard the pointer in much the same way.

The ability to call a static member function using a dot or arrow as if it were a nonstatic member function is actually a nice feature: it lets you change the member functions from static to nonstatic and back without rewriting the calls to the member functions. This offers greater design flexibility down the road. I don't know any way to achieve the same flexibility in C.

More to come
The question still remains: does eliminating the pointers from the function calls and function bodies actually improve the run-time performance of code that manipulates memory-mapped devices? To answer this, I ran some timing tests using different C and C++ tool chains and made measurements, which I'll present next time. You may be surprised by some of the results.

Endnotes:
1. Saks, Dan. "Alternative models for memory-mapped devices," Embedded Systems Design, May 2010, p. 9. www.embedded.com/columns/224700534.
2. Saks, Dan. "Memory-mapped devices as C++ classes", Embedded.com, June 2010. www.eetimes.com/discussion/other/4200572/Memory-mapped-devices-as-C--classes.
3. Saks, Dan. "Compared to what?" Embedded.com, August 2010. www.eetimes.com/discussion/other/4205983/Compared-to-what.

Loading comments...