Energy efficient C code for ARM devices
Our industry moves incredibly quickly. The hot topic last year is very rarely so important this year – either we will have solved it or some other issue will have risen to even greater prominence. The issue of efficiency, however, has been a relatively constant theme through my time in the industry, now some 24 years but efficiency has had many faces over that time.
In the early days, I can recall developing local area networking software on small 8-bit microcontrollers where RAM was the scarce resource. We worked long and hard to come up with cleverer and cleverer ways of storing, compressing, reusing and encoding our necessary data structures to make them fit.
Later on, I can recall a controller for a touch screen radio operator station in which the scarce commodity was code space. Limited to 32k and writing in C, the first version or two didn’t present any problems.
But by the third and fourth versions, with the customer requesting added functionality all the time, we were banging hard on the ceiling of the ROM. Again, we put in the nights and weekends working out better ways of reusing common code sequences, re-coding some functions in assembler by hand to make them smaller, repartitioning the C library to remove even more unused sections.
Then there were the systems which needed to respond so quickly. There is nothing like the demand of hitting the frame rate of a time-division multiplexed cellular phone system to focus the mind on interrupt latency and processing deadlines.
These days, it seems that we very often have RAM and ROM coming out of our ears and we have processing power to burn. So what is the pressing issue of today? So often, it is power-efficiency or, perhaps more exactly, energy-efficiency.
A huge proportion of today’s electronics is portable and battery-powered. We have got used to our cell phones running for a week between charges and our expectations of what they will do on that limited amount of energy goes up year on year.
Hardware engineers have been in the business of saving energy for longer than we software engineers but, increasingly, it is our job to utilize hardware features to the maximum and then realize even greater savings by writing efficient software.
A double-sided coin
One of the fundamental changes in our field in recent years has been the increasing use of platform operating systems in the embedded world. The majority of ARM-based systems these days, large and surprisingly small, will use either one of the many proprietary real-time operating systems available or will use a home-brewed scheduler of some kind.
Except at the small end, the people who develop the applications are generally not the same people as those who develop the operating systems under which they run. Both groups of engineers, though, need increasingly to focus on being energy-efficient.
There is some common ground between the two – indeed, much of what can be said about writing efficient applications applies equally to operating system development – but each contains specialist areas and focuses of its own (Table 1, below).
Table 1. The two sides of efficiency
Some basic stuff
There are some things which are so basic I hope I don’t have to say them. But, just in case, I will get them out of the way early on!
We have a very nice training course on optimizing embedded software on ARM platforms. In a series of optimization steps over 3 days, we take a standard piece of video codec software and progressively apply varying optimizations to it. Some are more effective than others but the cumulative effect is to improve performance by between 200% and 250%.
We then note that all of this would be completely wasted work without correctly enabling and configuring the data cache as that simple omission reduces performance by approximately 5000%. Modern ARM cores are specifically designed and optimized to run from caches as much as possible. Neglecting this simple fact is catastrophic. We shall have more to say about cache-friendly programming later on but, at this point, it is simply sufficient to say “Turn it on!”
Memory use is key
The fact that cache is so important is a direct result of the fact that memory access in general is costly – both in terms of time and energy. Almost all systems have some kind of memory hierarchy. It might be as simple as some on-chip scratchpad RAM and some external off-chip RAM; it might be as complex as two or three levels of cache between the core and the external memory system (Figure 1 below).
Figure 1. Typical System Memory Architecture
A rough rule of thumb might be that if an L1 cache access takes one time period, an L2 cache access might take around 10 and an external memory access of the order of 100. There is an increasing energy cost with this as well. The increasing expense of memory accesses the further they are removed from the core is something which you need to bear in mind at all times when looking at writing efficient code.
Figure 2 below shows the measured energy cost of memory accesses , benchmarked against the cost of executing an instruction.
Figure 2. Estimated cost of memory accesses.
An external memory access is typically going to take 100 times longer than accessing cache and cost 50-60 times the energy.
So, message number two, following on from simply turning on the cache, is to make sure that you use it as much as possible.