Speed optimizations in 8-bit MCUs
Many engineers think that everything that could be gotten out of 8-bit devices has already been done and the only way to further improve performance is to move to a more powerful device such as a 32-bit device. However, moving to a 32-bit device can be associated with penalties, for instance on power consumption and ease of use. Regardless of how efficient your 32-bit device is, an 8-bit device normally consumes much less static power than a 32-bit. And although the price of 32-bit devices is constantly decreasing there is still a significant price premium of a 32-bit compared to an 8-bit.
Performance improvements on an 8-bit device, both in size and speed, can be achieved by some simple means that are not obvious to all. This article gives you some basic tips and examples of how to get even more out of these devices.
Make full use of your compiler
The first and most obvious path to improving performance is to use the full set of functionality provided by your compiler. Compilers today are immensely advanced and many of them offer optimizations that previously only could be achieved by writing smart code. With the functionality provided by the smart compiler you can focus on writing understandable, readable and maintainable code.
All decent compilers today offer function inlining, unroll loops when applicable, eliminate common subexpressions and perform hoisting. Depending on the application and the available memory, compilers can optimize on speed at the expense of code size and vice versa. Make sure that you know your device and use the compiler accordingly.
However advanced your compiler is, there are optimizations that it just cannot do for you. This is where you as a programmer need to carefully consider how you write your code. I will list some important considerations and pitfalls, assuming that you are running on an 8-bit device.
Structuring the code
It is important that you structure your program so that you can optimize different parts of the code in different ways. At this stage, knowledge of the final application and how it will be used is crucial. Consider for example a program where one specific part of the code is used frequently and which is very time critical, for instance, a wireless communication stack of an Internet of Things application. To avoid unnecessary lag in the communication channel, it is imperative that this section of the code is fast.
You must thus structure the code and separate the communication stack part of the code from the rest so that it can be optimized based on speed, while the rest of the code is optimized on other parameters such as code size. Any seasoned developer also knows the positive side effects of structuring the code: added maintainability and portability.
Use the most efficient data type
Different architectures have different natural data sizes. 32-bit operations are for instance more resource consuming for an 8-bit device than for a 32-bit device. On the other hand, 8-bit operations are less efficient on a 32-bit architecture. As can be seen in the following example, the simple addition of two chars requires 2 cycles on a 32-bit machine, whereas the same operation only requires 1 cycle on an 8-bit machine. If the same operation is performed on two ints, the 32-bit machine can make it in 1 cycle and the 8-bit machine requires 2 cycles.
Data size is also important in memory-mapped I/O and communication protocols. The recommendation is therefore to use typedefs to define fixed size, as shown below.
Other recommendations on data types
The ANSI C standard prefers signed types, with the implication that any type without a signed/unsigned keyword will automatically be treated as signed. Signed types are more expensive than unsigned types, in particular because they inhibit optimization when arithmetic operations are substituted with cheaper bit operations. So unless you specifically need to handle negative numbers, use unsigned types.
Floating-point numbers are expensive. As a rule of thumb, calculations with single-precision floating-point numbers generate three times larger and slower code. Double-precision numbers are about three times larger and slower than single-precision numbers. However, there are obviously times where you really need to handle numbers with fractions. In most cases you do not need the high precision that floating-point arithmetic provides. For instance, let’s assume that you have a temperature sensor with a precision to 1/100th of a degree. If you use a uint16_t, measuring 1/100th degrees, you will still have a range from 0 to 650 degrees Kelvin. If that’s not enough, use an uint32_t.
Even though you may think that you’ve already gotten the most out of your 8-bit device, there are some additional things you can do to get even more out of it. In this text, I have highlighted some methods for this:
- Make use of your compiler. Make sure that you understand the full set of features provided by your compiler and trust it so that you can focus on writing structured, understandable and maintainable code.
- Optimize subsections of the code individually. Structure your code so that you can optimize different sections of the code separately, depending on how the code is being executed.
- Use the most efficient data type for your architecture. The natural data type is normally more resource efficient.
- Unless you specifically need to handle negative numbers, use unsigned types.
Bearing these tips in mind during development aids you in achieving the full potential of your 8-bit device.