Advertisement

Modern C++ embedded systems – Part 2: Evaluating C++

February 17, 2015

February 17, 2015

Having discussed the implementation of the main C++ language features in Part 1 of this series, we can now evaluate C++ in terms of the machine code it generates. Embedded system programmers are particularly concerned about code and data size; we need to discuss C++ in these terms.

How big is a class? In C++, most code is in class member functions and most data is in objects belonging to these classes. C++ classes tend to have many more member functions than a C programmer would expect to use. This is because well-designed classes are complete and contain member functions to do anything with objects belonging to the class that might legitimately be needed. For a well-conceptualized class, this number will be reasonably small, but nevertheless larger than what the C programmer is accustomed to.

When calculating code size, bear in mind that modern linkers can extract from object files only those functions that are actually called, not the entire object files. In essence, they treat each object file like a library. This means that unused non-virtual class member functions have no code size penalty. So a class that seems to have a lot of baggage in terms of unused member functions may be quite economical in practice.

Although class completeness need not cause code bloat for non-virtual functions, it is reasonable to assume that all virtual functions of all classes used in a system will be linked into the binary.

How big is an object? The size of an object can be calculated by examining its class (and all its base classes). Ignore member functions and treat the data the same as for a struct. Then add the size of a pointer if there are any virtual functions in the class or base classes. You can confirm your result by using the sizeof operator. It will become apparent that the combined size of objects in a system need be no greater than the size of data in a C-based procedural model. This is because the same amount of state is needed to model a system regardless of whether it is organized into objects.

C++ and the heap
Heap usage is much more common in C++ than in C. This is because of encapsulation. In C, where a function requires an unknown amount of memory, it is common to externalize the memory as an input parameter and leave the caller with the problem. This is at least safer than mallocing an area and relying on the user to free it. But C++, with its encapsulation and destructors, gives class designers the possibility (and responsibility) of managing the memory used by objects of that class.

This difference in philosophy is evident in the difference between C strings and a C++ string class. In C, you get a char array. You have to decide in advance how long your string can be and you have to continuously make sure it doesn’t get any bigger. A C++ string class, however, uses ‘new’ and ‘delete’ to allow a string to be any size and to grow if necessary. It also makes sure that the heap is restored when the string is destroyed.

The consequence of all this is that you can scrape by in an embedded system written in C without using ‘malloc’ and ‘free’, but avoiding ‘new’ and ‘delete’ in C++ is a much bigger sacrifice.

The main reason for banning heap usage in an embedded application is the threat of heap fragmentation. As the software runs, memory allocations of different sizes are acquired and released. The situation can arise where many small allocations are scattered through the heap and, although a large fraction of the heap may be available for use, it is all in small fragments and it is not possible to provide an allocation bigger that the largest fragment.

In an embedded system that runs continuously for years, heap fragmentation may occur only under certain conditions a long time after deployment, and may have been missed in test coverage.

Heap fragmentation can be avoided by using a non-fragmenting allocator. One solution is to re-implement operator ‘new’ and operator ‘delete’ using a collection of fixed-size buffer pools. Operator ‘new’ returns the smallest available buffer that will satisfy the request. Since buffers are never split, fragmentation (or external fragmentation to be precise) does not occur. Disadvantages of this technique are that it uses more memory and that it must be configured to provide the right number of the right sized buffers.

Another alternative to banning heap usage outright is to allow it during initialization, but ban it once the system is running. This way, STL containers and other objects that use the heap can be modified and configured during initialization, but must not be modified once initialization is complete. With this policy, we know that if the system starts up, it won’t run out of memory no matter how long it runs. For some applications, this level of flexibility is enough.

ROMable objects
Linkers for embedded systems allow const static data to be kept in ROM. For a system written in C, this means that all the non-varying data known at compile time can be specified by static initializers, compiled to be stored in ROM and left there.

In C++, we can do the same, but we tend not to. In well-designed C++ code, most data is encapsulated in objects. Objects belong to classes and most classes have constructors. The natural object-oriented equivalent to const initialized data is a const object. A const static object that has a constructor must be stored in RAM for its constructor to initialize it. So where in C a const static object occupies cheap and plentiful ROM, its natural heir in C++ occupies expensive and scarce RAM. Initialization is performed by start-up code that calls static constructors with parameters specified in declarations. This start-up code occupies more ROM than the static initializer would have.

So if a system includes a lot of data that can be kept in ROM, special attention to class design is needed to ensure that the relevant objects are ROMable. For an object to be ROMable, it must be capable of initialization by a static initializer like a C struct. Although the easy way to do this is to make it a simple C struct (without member functions), it is possible to make such a class a bit more object-oriented.

The criteria for a static initializer to be allowed for a class are:

  • The class must have no base classes.
  • It must have no constructor.
  • It must have no virtual functions.
  • It must have no private or protected members.
  • Any classes it contains must obey the same rules.

In addition, we should also require that all member functions of a ROMable class be const. A C struct meets these criteria, but so does a class that has member functions.

Although this solves the ROMability problem and enhances the C struct with member functions, it falls far short of the object-oriented ideal of a class that is easy to use correctly and difficult to use incorrectly. The unwary class user can, for example, declare and use a non-const, uninitialized instance of the class that is beyond the control of the class designer.

To let us sleep securely in our object-oriented beds, something more is needed. That something is ‘class nesting’. In C++, we can declare classes within classes. We can take our dubious class that is open to misuse and put it in the private segment of another class. We can also make the const static instances of the dubious class private static members of the encapsulating class. This outer class is subject to none of the restrictions that the ROMable class is, so we can put in it a proper restricted interface to our const static data.

To illustrate this discussion, let us consider a simplified example of a handheld electronic multi-language dictionary. To keep it simple, the translator translates from English to German or French and it has a vocabulary of two words, ‘yes’ and ‘no’. Obviously, these dictionaries must be held on ROM. A C solution would be something like Listing 20.

     /* A C ROMable dictionary */

     #include <stdio.h>

     typedef struct {
         const char* englishWord;
         const char* foreignWord;
     } DictEntry;

     const static DictEntry germanDict[] = {
         {"yes", "ja"},
         {"no", "nein"},
         {NULL, NULL}
     };

     const static DictEntry frenchDict[] = {
         {"yes", "oui"},
         {"no", "non"},
         {NULL, NULL}
     };

     const char* FromEnglish(const DictEntry* dict, const char* english);

     const char* ToEnglish(const DictEntry* dict, const char* foreign);

     /* ... */

     int main() {
         puts(FromEnglish(frenchDict, "yes"));
         return 0;
     }

Listing 20: A C ROMable dictionary

A Dict is an array of DictEntry. A DictEntry is a pair of const char* pointers, the first to the English word, the second to the foreign word. The end of a Dict is marked by a DictEntry containing a pair of NULL pointers. To complete the design, we add a pair of functions that perform translation from and to English using a dictionary. This is a simple design. The two dictionaries and the strings to which they point reside in ROM.

Let us now consider what happens if we produce a naïve object-oriented design in C++. Looking at Listing 20 through object-oriented glasses, we identify a class Dict with two member functions: const char* Dict::fromEnglish(const char*, and const char* Dict::toEnglish(const char*). We have a clean and simple interface. Unfortunately, Listing 21 won’t compile. The static initializers for frenchDict and germanDict try to access private members of the objects.

     // NOT a ROMable dictionary in C++

     #include <iostream>
     using namespace std;

     class Dict {
     public:
         Dict();
         const char* fromEnglish(const char* english) const;
         const char* toEnglish(const char* foreign) const;
     private:
         enum { DictSize = 3 };

         struct {
             const char* english;
             const char* foreign;
         } table[DictSize];
     };

     // *** Following won’t compile ***
     const static Dict germanDict = {
         {
             {"yes", "ja"  },
             {"no",  "nein"},
             {NULL, NULL}
         }
     };

     // *** Following won’t compile ***
     const static Dict frenchDict = {
         {
             {"yes", "oui" },
             {"no",  "non" },
             {NULL, NULL}
         }
     };

     // ...

     int main() {
         cout << germanDict.fromEnglish("yes");
         return 0;
     }

Listing 21: A C++ ROMable dictionary NOT!

If we make these members public and eliminate the constructor as in Listing 22, the class will meet the criteria for static initializers and the code will compile, but we’ve broken encapsulation. Users can see the internal implementation of the class and bypass the intended access functions. Even worse, they can create their own (con-const) instances of Dict whose internal state is outside our control.

     // A ROMable dictionary in C++, but with poor encapsulation

     #include <iostream>
     using namespace std;

     class Dict {
     public:
         const char* fromEnglish(const char* english) const;
         const char* toEnglish(const char* foreign) const;

     // PLEASE don’t access anything in the class below this comment.
     // PLEASE don’t create your own instances of this class.

         enum { DictSize = 3 };

         struct {
             const char* english;
             const char* foreign;
         } table[DictSize];
     };

     onst static Dict germanDict = {
         {
             {"yes", "ja"},
             {"no", "nein"},
             {NULL, NULL}
         }
     };

     const static Dict frenchDict = {
         {
             {"yes", "oui"},
             {"no", "non"},
             {NULL, NULL}
         }
     };

     // ...

     int main() {
         cout << germanDict.fromEnglish("yes");
         return 0;
     }     


Listing 22 : A C++ ROMable corruptable dictionary

Now, let’s do it right. In Listing 23, the class Dict in Listing 22 becomes Table, which is nested privately within the new class Dict. Class Dict also contains, as a static member, an array of Tables, which we can initialize statically. The function main() shows use of this class Dict, which has a clean interface.

     #include <iostream>
     using namespace std;
     class Dict {
     public:
         typedef enum {
             german,
             french
         } Language;

         Dict(Language lang);

         const char* fromEnglish(const char* english) const;

         const char* toEnglish(const char* foreign) const;

     private:
         class Table {
         public:
             const char* fromEnglish(const char* english) const;
             const char* toEnglish(const char* foreign) const;

             enum { DictSize = 3 };

             struct {
                 const char* english;
                 const char* foreign;
             } table[DictSize];
         };

         const static Table tables[];

         Language myLanguage;
     };

     const Dict::Table Dict::tables[]= {
         {
             {
                 {"yes", "ja"},
                 {"no", "nein"},
                 {NULL, NULL}
             }
        },
         {
             {
                 {"yes", "oui"},
                 {"no", "non"},
                 {NULL, NULL}
             }
         }
     };

     // ...

     int main() {
         Dict germanDict (Dict::german);
         cout << germanDict.fromEnglish("yes");
         return 0;
     }

Listing 23: A clean C++ ROMable dictionary

So to make the best use of object-oriented design for data on ROM, special class design is needed.

< Previous
Page 1 of 3
Next >

Loading comments...