Editor's Note: In this paper originally presented at Design East 2012, the author looks at issues and techniques for squeezing maximum energy from batteries in embedded systems in the following parts:
- In this part, the author reviews key methods for power reduction and addresses the nature of efficiency in embedded systems
- In Part 2 looks at the energy cost of memory access and power-reduction methods for memory access
- Part 3 continues the discussion with an examination of the role of computational efficiency in extending battery life
I have long been of the opinion that battery life and management of energy consumption is becoming one of the defining problems in embedded system development. I first gave a talk on this subject at the ARM TechCon event in Santa Clara in 2009. Trawling the web produced some academic research from the previous ten years and a few articles. In researching for this updated session, one of the most striking observations was how much more interest there is in this subject. Not a week goes by without at least one or two papers published on power-efficient chip design, energy-efficient software design, battery technology and so on.
As portable embedded devices become more and more powerful and more and more capable, the need to be frugal with energy is becoming more and more important. As well as all the other functionality, we now expect our smartphones to act as WiFi hotspots, portable data projectors, HD video players, high definition games consoles with stereo sound and the list goes on. All of these consume precious energy, stored in our precious battery. And that battery isn’t getting any bigger. With the form factor of the mobile phone constrained by the envelope of the standard shirt pocket, there is no room for it to grow.
The chip and board design community have been working for years towards power-efficient design techniques and synthesis tools are evolving very quickly with things like architectural clock gating, state retention power gating, dynamic voltage and frequency scaling and the like. But all this comes to naught unless the software systems which run on these platforms take advantage of the facilities offered by the hardware.
Given the emphasis on battery life for portable devices, it would seem strange that there are very few software engineers who actually have energy reduction in their daily project accountabilities. I suspect that those who do give the subject some thought are likely to do it on a “commendation vs. court martial” basis. We are entering a period when this will have to change. As battery life and performance requirements continue to fight with each other we, as software engineers, need to spend a lot more time looking at how we can design and write our software in an energy-efficient way.
As engineers, we all love finding geeky solutions to the problems which we come across. It may come as a surprise to find that, in this particular area, there are none. Clever tricks may save some power, but the field is dominated by other, simpler considerations. There are several very large elephants in this room and we must be careful to hunt the elephants we can see, before spending significant effort chasing smaller mammals.
I guess most of us know where the power goes. Silicon systems consume two kinds of power, in general.
Dynamic power is consumed when the system is running. This is the power used in switching logic elements from one state to another, driving I/O circuits, searching cache arrays and so on. It is clearly and obviously directly related to power supply voltage and operating frequency. In fact it is related to the square of the voltage, so the ability to reduce operating voltage is very useful indeed. Generally, the two go together as reducing operating frequency also allows a reduction in operating voltage, giving a double benefit when full processing power is not required.
So, that deals with the system when it is running. Almost all embedded systems support some kind of idle mode in which the CPU core is halted. In this state, there is nothing to do and it is simply waiting for something to happen to kick it into life. Typically, this will be an interrupt or timer event of some kind. In this state, dynamic power is essentially zero.
There may be some small logic circuits still active, looking for that all-important wake-up event, but otherwise the dynamic power consumption is eliminated. But the power draw does not drop to zero. This is because all silicon devices “leak”. Even when they are not running, electrons have a habit of leaking across the junctions. The rate at which this happens increases as the silicon process gets smaller, so a 20nm device will leak much more than a similar 65nm unit.
So, when the clock is stopped and the processor is doing nothing, leakage current is still drawn. Typically it is much smaller than dynamic current draw but it is still significant. This is what drives us to want to completely power down whole or partial circuit elements when they are not required. That way we can eliminate leakage current too.
We need to differentiate between energy consumption and power consumption. Most devices will have a budget for each. The power budget is usually to do with heating and the maximum rate at which heat can be dissipated before the device melts. It is the energy budget which determines battery life. Remember from Physics 101 when you were at school that energy is the product of power and time. So, to minimise energy consumption for a given computing task, we can either do it at a lower instantaneous power consumption or do it in a shorter time or, preferably, both.
The combination and interaction between these two is often quite complex to measure. Doing a task in a shorter time usually requires running the processor at a faster clock speed and therefore increasing its instantaneous power consumption. We get an overall energy saving if the time saving outweighs than the power increase. Conversely, throttling back the processor to reduce its power consumption is only worth doing if the task then does not take so long to complete that the overall energy usage is no smaller.
What is a complete given is that doing anything, absolutely anything, which a computing system requires consumption of energy. We could reduce the energy usage of any system, and therefore increase its battery life indefinitely, by eliminating its functionality. A house brick has an infinite battery life. Unfortunately it doesn’t do anything useful either. Not in computing terms at least.
As we have mentioned, the chip design boys and girls are getting really good at this. For years, they have been evolving clever strategies for reducing the power consumption of individual circuit elements. Here are just a couple of examples.
This is a simple flip-flop or latch. If powered all the time, it does two things: it stand ready to change its state on a change of input conditions and it retains its current state. It will do this as long as it remains powered. Unfortunately, as long as it remains powered, it will consume energy – even if its state is not changing. It will consume dynamic power when changing state and leakage power at all other times.
Here, we insert a gate into the clock signal. This, surprisingly, iscalled “clock gating” – imaginative these hardware boys. This allows usto temporarily disconnect this particular element from the system clock.Its power consumption will go down as it is no longer changing state.In this situation, it will retain its internal state and will restartnormal operation as soon as the clock is reconnected.
We can take this further and introduce networks of clock gates into the design so that entire blocks can selectively be disconnected from the clock, thus reducing power when these circuits are not required. Being powered, they will retain state and power consumption will reduce to leakage current only.
Another technique is called state-retention power gating.
This logic element at the top can be powered on and off as we have placed a switch in its connection to the power rail. This is effective and certainly reduces power consumption when this element is not required. However, switching it off will cause it to lose its state. When reactivating it, we will have to expend energy and time restoring it to its previous or initial state before we can bring it back into use. The energy required to do this reduces the effectiveness of power gating and limits the circumstances in which we can use it.
So, we do something clever and introduce a second power rail.
This one may be at a significantly lower voltage than the main power supply and its purpose is simply to allow the circuit to retain its internal state when the main power supply is withdrawn. It is not sufficient to allow the circuit to operate normally but is sufficient for state retention. Clearly, this introduces complexity into the hardware design but makes it much easier and more useful to be able to switch individual circuits on and off at short notice.
So this is what the chip design people can do for us, either manually or by inserting these things automatically at synthesis-time. Both techniques reduce leakage current when the system is idle. But, the software must allow the system to enter some kind of low power state to make use of this. Our goal as software developers is to ensure that we do this as often as possible for as long as possible. Most systems provide a range of low power modes.
This shows the range of power saving modes provided by the Cortex-M0. They range from zero power consumption in power off state, through a variety of sleep modes, to fully powered operation in active state. We may have a spectrum of options in active state as well involving voltage and frequency scaling. Simply put, this is as far as the chip designer can get you. The goal of the software engineer is to design and implement the software system so as to spend as much time as possible as far to the left as possible on this spectrum. We’ll come back to the details of this later.
But if the software runs at full whack in active state all the time, it doesn’t matter how clever the chip designer is: he’s wasted his time.
Two types of efficiency
All computing machines carry out two essential functions. And they are both essential – without both no meaningful tasks can be accomplished.
Computation – or data manipulation – is an obvious one. All programs perform computations, be they comparisons, analyses, calculations or manipulations. Typically, computation is carried out by arithmetic processing units of some kind on values held in machine registers. These could be integer operations, floating point arithmetic, vector processing and so on.
Clearly, computational tasks should be carried out as efficiently as possible. In general terms, this equates to executing the smallest possible number of instructions in the shortest possible time. Most importantly, efficient computation allows one of two things: either we can finish earlier and go to sleep; or we can turn the clock speed down and still complete within the allotted time. This balancing act is one we shall return to later.
What is often neglected is the aspect of communication. By that I simply mean the business of moving data from one place to another. For many systems, moving data from place to place is their raison d’etre. However, it’s more fundamental than that.
In the majority of architectures, and ARM, as a load-store architecture, is no exception, data movement is quite simply essential and non- avoidable. You cannot really process any information without moving it from one place to another and then very often back again. Values in memory, for instance, need to be moved into core registers for processing and then results need to be written back. You might think this is straightforward, cheap and easy but it isn’t. Firstly, memory systems are not straightforward. Listen to John von Neumann, speaking in 1945.
“Ideally one would desire an infinitely large memory capacity such that any particular word would be immediately available. We are forced to recognise the possibility of constructing a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.”
As ever, von Neumann shows an amazing ability to pinpoint one of the most fundamental aspects of all today’s computer architectures. We can easily substitute “costs more energy to access” for “less quickly accessible” in his final sentence and we have a general principle which will guide much of our discussion today. Hierarchical memory systems and how you manage them are a huge part of managing energy consumption in software systems.
But which of these areas consumes greater energy? Which is going to bring us the greatest payback?
The diagram, essentially received wisdom, reflects the fact tha the memory accesses associated with a program are made up of approximately 60% instruction fetches and 40% data accesses.
This diagram shows the results of some research we conducted. This was based on a Cortex-R4 system but there is independent research using an ARM7TDMI system which comes to broadly the same conclusions.
If we benchmark the cost of fetching and executing an instruction as 1 unit of energy, then the incremental cost of accessing a variable held in TCM is roughly 1/25, the cost of an L1 cache access around 1/6 and an L2 cache access around 1. The cost of an external RAM access is a whopping 7 times the cost of an instruction execution.
These figures are simplistic. For instance, they assume minimal PCB loading, ignore pin overhead and locality effects in DRAM. But for our purposes these can safely be ignored. If you care to spend a few minutes with the search engine of your choice, you can find figures all over the net which will back up the relative energy costs here.
Think of this another way: for each external RAM access, we can execute 7 instructions, carry out 40 cache accesses or access TCM around 170 times for the same energy cost.
So we are brought to a conclusion which is not a pleasant one. Computation is cheap, while communication is expensive. Simply put, moving data around costs a lot more energy than processing it.
Chris Shore is passionate about ARM technology and, as well as teaching ARM’s customers, regularly presents papers and workshops at engineering conferences. Starting out as a software engineer in 1986, his career has included software project management, consultancy, engineering management and marketing. Chris holds an MA in Computer Science from Cambridge University.
This paper was originally presented at Design East 2012.