C++ template metaprogramming for AVR microcontrollers - Embedded.com

C++ template metaprogramming for AVR microcontrollers

The AVR 8-bit microcontroller's modified Harvard architecture was introduced in 1997. These Atmel microcontrollers are popular among a reasonably large number of developers because they provide decent performance coupled with low power consumption. It's also fair to say that a large part of the AVR's popularity comes from the fact that these microcontrollers are used in many Arduino systems.

The Arduino
Introduced in 2005, the Arduino is an open-source prototyping platform that was originally targeted at beginners, students, and non-engineers (including artists). The Arduino soon found itself at the forefront of the flourishing Maker Movement. The Arduino environment has evolved since its inception, and there now exist a wide variety of Arduino boards and supplementary devices. A simple-to-learn subset of the C programming language — coupled with a diversity of libraries written by enthusiasts from all over the world — make it possible to create Arduino applications for virtually any problem solution. This facilitates the ability of both beginners and professionals to use the Arduino to check ideas and create new device prototypes.

Having said all this, it is unlikely that one would use Arduino software for real, commercial projects. The main reason for this is the inefficiency of the resulting code[8 ]. The simplicity and universalism of the Arduino tools limits their ability to take full advantage of the AVR microcontroller’s potential, its performance, and inherent parallelism.

Software development approaches
For the purposes of these discussions, software development approaches may be considered to fall into two main camps: old school and new school.

Old School: These programmers are true hardware and software experts. Their tools are assembly language and the C programming language. Their main goal is to squeeze everything they can out of every byte — to achieve ultimate code performance while minimizing power and memory consumption. However, code written by these programmers is not always easy to understand, which can make ongoing development and maintenance difficult.

New School: People brought up in the era of C++ and objects tend to see objects in everything. Classes are an excellent example of code reuse. The use of classes encourages developers to achieve better code structure and well-thought-out assignments of responsibilities among components. Properly written object-oriented code is easy to understand and support. However, using the object-oriented capabilities of C++ comes at a cost. One oft-cited drawback of using C++ is it’s perceived lack of efficiency. The automatic generation of class methods and the implicit creation of temporary objects can lead to a substantial reduction in code performance[7 ]. As a result, developing efficient C++ code is something of an art.

C++ templates
One of the strengths of C++ is its templates mechanism. The main idea is the ability to create a generalized definition of code behavior without explicitly specifying the entities to use. An obvious example of template usage is the standard templates library (STL), which presents three main types of entities: containers, algorithms, and iterators.

Generalized containers can be specified with required data types at the point of use. Algorithms know nothing about containers, and cooperation between algorithms and containers is achieved via the iterators mechanism. This allows the STL to demonstrate an amazing flexibility and to provide solutions for an unlimited number of practical problems.

An obvious advantage of a class templates is the fact that the compiler only instantiates those class members that are actually used in the code, while any remaining code receives only a syntax check. This eliminates unused code and reduces program memory consumption. The specialization mechanism allows fine-tuning of the code's behavior depending on template parameters and provides an excellent opportunity for code optimization. At the same time, the templates syntax is not particularly convenient and the compiler is not friendly to the template code. This makes development quite difficult and is probably the main disadvantage of using templates.

The easiest way to illustrate the concept of templates is by means of an example. Let's suppose we wish to create a min() function for use with integer values. An obvious solution for a C programmer would be to implement something like the following:

   ​int min (int a, int b)


       return (a < b) ? a : b;


If a similar capability is required for use with floating-point values, then an additional function needs to be defined as follows:

   ​float min (float a, float b)


       return (a < b) ? a : b;


That is, for every data type for which we wish to perform this task, we need to create a unique function. By comparison, the solution for a C++ programmer can be as simple as follows:


   T min (T a, T b)


       return (a < b) ? a : b;


In this case, the types of the values are not explicitly specified; instead, we use T notation that appears in the template definition with the typename keyword. For function (and class method) templates, the compiler is able to deduce the required parameter type based on the types provided by the user.

Not unreasonably, the compiler will report a problem if such a function is called with a pair of different data types. If this is done intentionally, however, then the solution is simple: all that is necessary is to explicitly specify the required type during the function call as illustrated below:

   float float_variable = 3.141;

   int   integer_variable = 3;

   int   result = min(float_variable, integer_variable);

Or, depending on the programmer's requirement:

float result = min(float_variable, integer_variable);

When instantiated in this way, the function can work with arbitrary types of data, the only requirement is the '<' (less than) operation defined for the type used. Such behavior is very similar to dynamically typed languages; however, there is a fundamental difference. For languages like Python, a single instance of the function is sufficient. As a statically typed language, C++ requires a distinct instance for each type with which the function is used. Here, we completely rely on the compiler to perform all that necessary work for us. However, although this is convenient, this is also the sort of thing that can lead to code bloat.

Of course, this particular function is an obvious candidate for inlining due to its small size, so this wouldn’t be a problem. However, a bunch of differently parameterized class template instances having bulky methods can lead a significant increase in the size of the code. Todd Veldhuizen gives recommendations that can help avoid this[5 ].

Template metaprogramming
In 1994, during a meeting of the C++ standardization committee, Erwin Unruh first demonstrated an ability of templates to execute computations at compile time. The code he presented produced a series of diagnostic messages containing prime numbers. As further investigations revealed, this ability to perform compile-time calculations has computational completeness[6 ]; indeed, it is possible to perform arithmetic operations, create loops via recursion, and branching via specializations.

Also noticed was a similarity between templates and conventional runtime functions[3 ]. Template parameters are similar to ordinary function parameters, while nested types and enumeration constants mimic return values. The very same entities used as template parameters can be used as metafunction parameters and returning values as follows:

  • Constant values of integral and enumerated types

  • Types

  • Pointers

Continue to page 2 >>

Enumerated types: The simplest and clearest case is the use of enumerated types. For example, the following metafunction raises the BASE value to the power of PWR .

template // primary template

   unsigned PWR,

   unsigned BASE = 10 // template class parameters can have default values


struct power


   enum{value = BASE * power::value};


Template // template specialization

struct power<0,BASE>


   enum{value = 1};


As we see, in order to calculate a result, the power template invokes itself recursively with modified values of PWR . In order to prevent endless recursion, a specialization is required; in this case, for PWR having a value of zero.

Examples of use are as follows:

   unsigned KILO = power<3,10>::value;

   unsigned MEGA = power<6,10>::value;

   unsigned kBytes = power<10,2>::value;

In the above examples, KILO will be assigned a value of 1,000; MEGA will be assigned a value of 1,000,000; and kBytes will be assigned a value of 1,024.

Computations on types: Suppose we need to pass a function an input parameter whose type is ValueType . We would like an optimal way of parameter passing — by value or by const reference — that is determined automatically depending on the target platform and the size of the particular parameter type. Consider the following:


   typename ValueType

struct PARAM


   typedef typename type_selector<

               (sizeof(ValueType*) < sizeof(ValueType)),

               const ValueType&,


               >::type type;


The type_selector template used inside the metafunction has been described by different authors[3 ,4 ] and may look like the following:

template // primary template

   bool CONDITION, // logical condition

   typename TYPE0, // type chosen if CONDITION is true

   typename TYPE1 // type chosen if CONDITION if false

struct type_selector


   typedef TYPE0 type;


template // specialization for CONDITION is false

struct type_selector<0,TYPE0,TYPE1>


   typedef TYPE1 type;


In this case, type_selector returns TYPE0 when CONDITION is true, or TYPE1 otherwise. As a condition we use the logical expression sizeof(ValueType*) < sizeof(ValueType) . For example, if the parameter type is uint32_t , we can define the function as follows:

   void function (typename PARAM::type value){…}

Here, the compiler requires us to specify the typename keyword when referring to the template because the type used for the parameter is nested. Such a declaration/definition of a function may look cumbersome; nevertheless, the problem is solved — on a 32/64-bit platform the parameter will be passed by value, but it will be passed as a const reference when compiled for an AVR microcontroller in which the size of a pointer is two bytes.

Pointers as a template parameters: Now suppose that inside some code we wish to call a function whose type is declared as follows:

typedef void (*CALLBACK_FUNCTION_TYPE)(); // Type of callback function

Then if we define our code as a template as follows…

// As a template parameter we use cb_func object




void some_code(…)



   cb_func(); // function call inside the code



…we can specify the required function at the point our code is executed:


Since the pointer to the function is known at compile time, the compiler is able to effectively inline the function[5 ]. Function inlining significantly affects the size of code and speed of execution[7 ] (three whole chapters of this book are devoted to this topic). Todd Veldhuized provides interesting metafunction examples in his article[5 ]. Loop unrolling for a dot product algorithm and trigonometric constant computations for Fast Fourier transforms via sequence summation are just a couple of examples. It is important to understand these actions have a zero cost at runtime because they are performed during the compilation stage.

When it comes to reusable code, the question of interface comes to the fore. The importance of a well-defined interface has been discussed on the web. The set of requirements traditionally imposed on a good interface includes good abstractions, the hiding of implementation details, minimalism and completeness, ease of use, the difficulty (or impossibility) to misuse, and others[9 ]. Unfortunately, when metaprogramming is used, some of these requirements are difficult to realize.

The ability to perform compile-time computations is a property of C++ that was discovered by accident. This is the reason for the awkward syntax, which does not make the development of metaprogramming code and that use of template-based interfaces any easier.

The inability to explicitly specify diagnostic messages during compilation makes misuse checks difficult. However, some changes in this direction have already been made; for example, static assertions in the boost library and modern language standards.

When developing hardware control software, it is necessary to provide the user with full control over all device components. At the same time, the interface minimalism requirement remains. A rational approach would be to properly order the interface parameters; those that are changed often should go first, while the remaining parameters should be assigned reasonable default values that meet the most typical use cases.

A convenient approach to interface building is a design that uses strategy (also known as policy) classes[1 , 2 ]. The idea is quite simple. Part of the functionality to be implemented is delegated to external classes (strategies) that are used as template parameters. Now, changing a behavior simply requires choosing another strategy. This is very similar to ordinary runtime function parameters where we get different results when we pass in different values. A function with hardcoded parameter values would always return the same result, which makes a little sense. Fully functional types (classes) can be used as template parameters. This makes it possible to adjust an algorithm at the point of use by specifying those strategies with the required behavior. In turn, this provides a new level of generalization and flexibility.

Let's take a look at the interface implementation example of a USART (Universal Synchronous/Asynchronous Receiver/Transmitter) that is provided in typical AVR microcontrollers.

enum USART_ID // Device id







enum BAUD_RATE // Baud rate


   BR_2400 = 2400,


   BR_921600 = 921600,



// Frame control strategy class


   BAUD_RATE baud = BR_9600,           // Baud rate (enum)

   DATA_BITS data_bits = DATA_BITS_8, // Data bits (enum)

   PARITY parity = NO_PARITY,         // Parity         (enum)

   STOP_BITS stop_bits = STOP_1,       // Stop bits (enum)



The strict type system of C++ requires us to specify the values that exactly correspond to declared data types as template parameters. The baud rate, for example, may only be specified by the values declared in the BAUD_RATE enumeration. If some special (non-standard) baud rate value is required, then we would use the BR_CUSTOM value. In this case, the desired baud rate value can be assigned to BR_CUSTOM via the definition of a CUSTOM_BAUD_RATE macro.

The USART class definition would be as follows:


   // Device ID (enum)

   USART_ID id,

   // Exchange parameters (strategy)- FRAME_CONTROL struct

   class usart_ctrl       = FRAME_CONTROL<>,

   // Receiver parameters (strategy)- USART_RECEIVER struct

   class receiver         = USART_RECEIVER<>

   // Transmitter parameters (strategy)- USART_TRANSMITTER struct



typedef USART<






             usart_0; // USART0 device, 921600 baud, 8N1,

                             // receiver is not used, transmitter buffer is 32 bytes

typedef USART<






               usart_1; // USART1 device, 9600-7E2, receiver buffer is 16 bytes,

                               //transmitter buffer is 32 bytes

typedef TWI<400000> I2C; // TWI-interface on 400 kHz

So the USART struct uses four template parameters as follows:

  • USART_ID to choose the required device from the existing set of devices (Mega 256 only; for lesser chips use USART0 )

  • The usart_ctrl strategy, which only has the FRAME_CONTROL , and which specifies exchange parameters (see above).

  • The receiver strategy that has two implementations: USART_RECEIVER to specify the required receiver parameters (buffer size, interrupt control) and RECEIVER_DISABLED to disable the receiver.

  • The transmitter strategy where the parameters to the transmitter have the USART_TRANSMITTER and TRANSMITTER_DISABLED implementations.

Such a set of strategies provides full control over the device and — thanks to the default values — simplifies class parametrization for typical use cases.

Once the required types have been declared, we can initialize the device as follows:



It should be noted that we are using a peculiar call syntax here. Instead of the usual '.' structure reference operator, the scope resolution '::' operator is used. This is because the USART class methods are defined as static and we are working with types, not with objects here. This eliminates any overhead cost for object construction and destruction; moreover, it explicitly expresses the singleton-like nature of the device. This doesn't mean that we refuse to use the usual objects in favor of this types/classes usage — presumably a lot of objects will exist in the code — but such an approach makes more sense when we are talking about hardware control structures.

The assembly code generated for the Mega256 appears as follows:

000000ba <_Z10usart_initv>:

ba:           10 92 c4 00             sts           0x00C4, r1

be:           10 92 c5 00             sts           0x00C5, r1

c2:           10 92 c0 00             sts           0x00C0, r1

c6:           88 e2                   ldi           r24, 0x28           ; 40

c8:           80 93 c1 00             sts           0x00C1, r24

cc:           86 e0                   ldi           r24, 0x06           ; 6

ce:           80 93 c2 00             sts           0x00C2, r24

d2:           10 92 26 01             sts           0x0126, r1

d6:           08 95                   ret

000000d8 <_Z8twi_initv>:

d8:           8c e0                   ldi           r24, 0x0C           ; 12

da:           80 93 b8 00            sts           0x00B8, r24

de:           10 92 b9 00             sts           0x00B9, r1

e2:           85 e4                   ldi           r24, 0x45           ; 69

e4:           80 93 bc 00             sts           0x00BC, r24

e8:           10 92 03 01             sts           0x0103, r1

ec:           08 95                   ret

As we see, the constants necessary for device initialization are computed at compile time.

Yet another example
Suppose we devise an exchange protocol whose strategy-based declaration might appear as follows:


   class transport,

   class params = PROTO_PARAMETERS<...> // Some protocol parameters



The following point is interesting here: protocol transport is defined as a template parameter. That allows to tune the protocol at the point of use:


The same protocol can be used with a different device, PSI or TW for example, that is:


There are not any restrictions imposed on a strategy classes, as for example a requirement of inheritance from some base class. The only requirement to the type used as a strategy is an existence of methods with a proper signature (send and receive for example).

Any required number of strategies may be defined. Every strategy should be responsible for certain functionality aspects that ensure their orthogonality [2 ]. Each strategy in turn may have multiple of implementations. As a result the number of different behavior variations (the number of different combinations of strategies) may be large enough. That produces great code flexibility while not introducing typical performance problems caused by inheritance, and it exposes an excellent example of static polymorphism.

When required data types are defined, use them in following way:

   PROTO_SERIAL::send(data, size); //sending data block to usart_0 device

   PROTO_TWI::send(data, size);   //sending data block to TWI device

Needless to say, software debugging is not an easy task. Template code debugging is much more complex because of compiler unfriendliness. Any typo in the source causes a long diagnostic output that is additionally doubled due to double-pass compilation mode. These huge outputs should be read from the very beginning and then a minimal amount of code modifications should be performed before the next attempt to compile. Some parts of messages caused by propagated errors will disappear after the original error is fixed.

Template specialization does not relate to the primary template by any kinship. In fact specialization can be thought of as just an independent class that is substituted instead of the primary template when parameter values match. So to be sure to a certain degree that the template code works, every specialization should be instantiated at least once. That all makes templates debugging process tedious and long drawn out.

Embedded code debugging in turn can became a nightmare for the engineer, notably in the absence of special equipment. In this case the only way is the brute force method: insertion of debugging messages.

Suppose we debug DEVICE class whose interface is as follows:


   class params = DEVICE_SETTINGS<...>, // some device settings

   class dbg = NO_DEBUG

struct DEVICE


   static uint8_t some_method(uint8_t parameter)


       dbg::print(“%s:%dn”, __FUNCTION__, parameter);


       dbg:: print(“retval:%dn”, retval);

       return retval;



Here the dbg template parameter initialized with NO_DEBUG default value is interesting for us. Inside the method some dbg::print is called. In the application code, DEVICE might be declared as follows:

typedef DEVICE_SETTINGS<...> DEV_SETTINGS; // use typedef for short

typedef DEVICE > device; // device type declaration

It can be seen here that the dbg parameter is initialized with some AVR_DEBUG template that is parametrized with usart_0 type. If have a look at the AVR_DEBUG definition we see something like:


   class SENDER

struct AVR_DEBUG



   static void print(const char* fmt, …)


       va_list ap;

       va_start(ap, fmt);

       SENDER::_vprintf(fmt, ap);


download examples of the code that is used for actual projects. This code has been developed by using avr8-gnu-toolchain- and TUT (C++ Template Unit Test Framework). Code was created for our own needs and surely needs improvements, but we publish it in the hope that it will be helpful to someone else.

The use of object-oriented features of C++ allows to improve the code structure, its readability and intelligibility. Classes are the perfect embodiment of a code reuse idea.

An inherent flexibility of C++ templates allows us to produce highly generalized yet very efficient code. Independence of code on data types makes it possible to defer many design decisions to the final development stage or change the decisions without significant code rework.

Use of a strategy based design approach provides diversity of code behavior without use of inheritance and related performance problems typical for dynamic polymorphism. Template specializations provide an engineer with excellent opportunities for optimization and fine tuning of code behavior.

The capability in C++ to perform calculations during the compilation stage was accidentally discovered and demonstrated in 1994 by Erwin Unruh. Although this feature was not part of the original purpose of the language creators, it provoked the strong interest of many developers. The ability to perform calculations during the compilation stage provides developers with a new level of code generalization and efficiency. Nowadays, this mechanism is well known to C++ programmers and implemented in such libraries as Blitz++ and boost::MPL.

Inside this single language, it is possible to control runtime code behavior as well as generation of the same code during compile time. Inside the single linguistic construction (template function) both statically bound entities (template parameters) and dynamically bound entities (function arguments) coexist. For this reason, Todd Veldhuizen calls C++ a two-level language.

Use of metaprogramming contributes to much better runtime execution speed and sometimes to smaller generated-code size due to decisions taken during compile time. The project parameters that are not changed during code execution (the compile time constants) give good opportunity for optimization through metaprogramming. Every value that can be computed at the compilation stage, every branch controlled by a constant condition are great candidates for optimization. In other words it is often possible to accelerate the program execution at the expense of compilation time increase. The article [10 ] illustrates the results of the comparison of template meta code designed for the AVR chip against the conventional code made with Atmel libraries.

Meta program code development is quite difficult and a lingering process that is hardly suitable for one-off projects. However, when it comes to library development, the efforts are excused by an expectation that such long-term investment will pay off every time the code is reused [3 ].

Good portability means less work with a better result and is always an advantage. If we turn back to the protocol example, portability is the key. Creation of a separate protocol implementation for every side of interaction makes little sense. It is much better if you can provide your client with a protocol definition in form of relevant code when you deliver the product. That makes software development for the required platform much easier for them.

Judging by the limited number of publications, C++ templates and metaprogramming in particular are not very much in demand in embedded software, but this is the area where these techniques can bring significant benefits. These techniques allow the use of traditional object-oriented approaches while providing efficiency found in hand written C and ASM code.


  1. David Vandevoorde and Nicolai M. Josuttis. C++ Templates: The Complete Guide

  2. Andrei Alexandrescu. C++ Design: Generic Programming and Design Patterns Applied

  3. David Abrahams and Aleksey Gurtovoy, C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond

  4. Davide Di Gennaro. Advanced C++ metaprogramming

  5. Todd Veldhuizen. Techniques for Scientific C++. Indiana University Computer Science Technical Report #542

  6. Todd L. Veldhuizen. C++ Templates are Turing Complete (2003).

  7. Dov Bulka and David Mayhew. Efficient C++. Performance Programming Techniques. Addison-Wesley 2000.

  8. Dale Wheat. Arduino Internals. Apress.

  9. Martin Reddy. API Design for C++. 2011 Morgan Kaufmann Publishers

  10. Christoph Steup, Michael Schulze, Jorg Kaiser. Exploiting Template-Metaprogramming for Highly Adaptable Device Drivers – a Case Study on CANARY an AVR CAN-Driver. Department for Distributed Systems Universitat Magdeburg

This article was originally posted in Russian on Geektimes.ru, and this edited version is published here with their kind permission.

Valery Ignatov completed applied mathematics at Leningrad Electrical Engineering Institute V.Ulianov (Lenin), currently known as Saint Petersburg Electrotechnical University “LETI”. Valery worked for about 20 years as an electronics engineer, repairing and maintaining computing equipment. About 10 years ago, he started working as a software engineer. For the past two years, he's been working in embedded software, where his primary target platform is 8-bit AVR MCUs.

8 thoughts on “C++ template metaprogramming for AVR microcontrollers

  1. “Yes, you're right, and this is not the only problem with this text. Unfortunately I cannot edit text on this site. List of amendments is sent already to my editors, hopefully they will be able fix them soon. In the case of doubts you can take a look at th

    Log in to Reply
  2. “I began to write library on for stm32 microcontrollers some time agonhttps://github.com/ThatEmbeddedGuy/Stm32-Template-Gpio-Library/nBut as other template microcontroller libraries,looks like it will never be finished.nHope this code may be useful

    Log in to Reply
  3. “I missed your comment, seems like notification system doesn't work on this site.nThanks for sharing your code, Kirill, That's great!”

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.