Lower MCU power consumption by using omniscient code generation - Embedded.com

Lower MCU power consumption by using omniscient code generation


The increasing emphasis on green technologies has focused moreattention on low power design. microcontroller vendors are respondingby increasing their offerings of ultra low power devices that consumeas little as 350 µA/MHz and have sub-µA sleep modes.

Using a low power MCU is a first step in loweringpowerconsumption. But the next step should be to minimize the number ofinstructions the CPU must execute to get the job done, so it can spendas much time as possible in the deepest sleep mode available.

This can be accomplished with careful design of the applicationsoftware structure, tight coding and a compiler that reduces the number of required instructions.

Compilers can have a significant effect on the number ofinstructions required to execute the application and the resultingpower consumption. All too often the method of compilation wastes CPUcycles on interrupt routines and on locating addresses in memorydevices with banked memory.

Keep interrupts small
Interrupt should always be small and fast. This is especially true whenspeed and/or power consumption are critical. Keeping interrupt routinessmall reduces interrupt overhead.

The compiler's contribution to interrupt is in the way it generatesthe context – the number of registers it saves in response to aninterrupt. Ordinary compilers save every register that might be used byan interrupt because they have no way of knowing which registers willor will not be used by a given interrupt.

The problem is that the number of cycles used is a direct functionof the number of registers that are saved and restored. Cycles spentsaving and restoring the context consume power.

For example, one compiler for Microchip's PIC16 always saves 8Bytes ofdata for every interrupt using a total of 42 instruction cycles (23 forthe context save and 19 for the restore). This may not seem like much,but in a interrupt intensive application, the CPU could spend thousandsof extra cycles “awake,” unnecessarily consuming power.

Newer compilers are now available with omniscient code generation (OCG)technology that has the intelligence to save only those registers thatare required for each particular interrupt. OCG works by collectingcomprehensive data on register, stack, pointer, object and variabledeclarations from all program modules before compiling the code.

An OCG compiler combines all the program modules into one largeprogram which it loads into a call graph structure such as this:

Based on the call graph, the OCG code generator creates a pointerreference graph (Figure 1 below )that shows each instance of a variable having its address taken, pluseach instance of an assignment of a pointer value to a pointer (eitherdirectly, via function return, function parameter passing, orindirectly via another pointer).

It then identifies all objects that can possibly be referenced byeach pointer. This information is used to determine exactly how muchmemory space each pointer will be required to access.

Figure1: The pointer reference graph shows each instance of a variablehaving its address taken, plus each instance of an assignment of apointer value to a pointer.

Since an OCG compiler knows exactly which functions call, and arecalled by, other functions, which variables and registers are required,and which pointers are pointing to which memory banks, it also knowsexactly which registers will be used for every interrupt in theprogram. It can generate code accordingly, minimizing both the codesize and the cycles required to save and restore the context.

Substantial savings
The smallest context will require 17 cycles – 10 to save the contextand seven to restore it. The worst case for the OCG compiler is only 25cycles. Compared to a conventional compiler, an OCG compiler can reducethe number of interrupt-related instruction cycles by 40 percent to 60percent.

Depending on the application, the cycle savings can be substantial.An interrupt driven serial communication port with a baud rate of480,600bit/s generates 24,000 interrupts per second. Using aconventional compiler with 42 instruction cycles per interrupt (168clock cycles per interrupt) saving and restoring the context will useup over 4,032,000 CPU cycles per second or 20 percent of the availablecycles on a 20MHz PIC16.

An OCG compiler, averaging 21 instructions cycles per interrupt (84clock cycles per interrupt), can reduce that number to only 2,016,000cycles – saving half of the clock cycles otherwise spent on saving andrestoring contexts, and allowing the CPU to be put into sleep mode for10 percent of its cycles.

Assuming 10mA active and about 1µA sleep mode powerconsumption, an OCG compiler could reduce total MCU power consumptionby nearly 1mA—about 10 percent. In an application with an 8mA powerbudget, that extra milliamp could be a life saver (Figure 2 below).

Figure 2: An OCG compiler can reduce total MCU power consumption byabout 10 percent.

Bye-bye banked memory
Many 8-bit and 16-bit MCUs have banked memories that cannot be addressed simultaneously. Switching between thememory banks requires at least two bank selection instructions.

Thus, if data in one bank must be written to another bank, bankselection instructions are always necessary. Placing all the variablesaccessed by a function in the same memory bank will reduce the numberof bank selection instructions and the total required cycles for theapplication.

However, conventional compilers have no way of knowing whichfunctions call which variables and are unable to optimize their memoryassignment. Nor do these compilers have any way of knowing whether aparticular memory bank will be selected in the code. As a result, thesecompilers automatically generate bank selection instructions for everymemory access, whether or not that bank is already selected.

Some compilers have extensions to the C-code thatidentify the address of the variable. Programmers may manually assignvariables to memory banks using this non-standard, non-portable code.The bank qualifiers allow the compiler to see the exact bank an objectresides in and reduces the number of bank selection instructions.

However, this approach does not guarantee that dependent variableswill be placed in the same bank. Every time a variable in one memorybank needs to be written to another memory bank, bank selectioninstructions will still be required.

In addition, trying to track all the memory addresses acrossmultiple code modules and ensuring that all pointers have the correctaddresses is a time-consuming, tedious process that can introduceprogramming errors.

In contrast, an OCG compiler knows every register, stack, pointer,object and variable declaration from all program modules. It canoptimize every variable, register allocation and the size and scope ofevery pointer and every object stored on the compiled stack. Itoptimizes memory allocation to minimize or eliminate bank selectioninstructions, without any intervention from the programmer.

By placing frequently accessed variables in unbanked memory and byplacing any dependent variables in the same memory bank, an OCGcompiler can radically reduce the number of cycles and power wasted onbank selection instructions in these MCU architectures.

Since the OCG compiler knows which bank is selected at any point inthe code, it can also eliminate any unnecessary bank selectionsinstructions when the bank is already selected.

Reducing the number of instructions reduces the number of CPUcycles by as much as 30 percent to 50 percent. Choosing a low-powerdevice and exploiting the sleep mode capabilities of the MCU areimportant means of minimizing power consumption.

However, the way in which the compiler manages interrupts andmemory usage can also have a significant impact on power consumption.Newer compilers with OCG technology can make a substantial contributionto saving cycles and power.

Clyde Stubbs is Founder and CEO of Hi-TechSoftware. He can be contacted at clyde@htsoft.com

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.