Back to the Basics - Practical Embedded Coding Tips: Part 6 -

Back to the Basics – Practical Embedded Coding Tips: Part 6

In Part 5 in this series, Idiscussed how a compiler works, and gave some examples of how to codebetter. In this last part in the series, we will look at some moreconcrete tips (12 of them ) onhow to write compiler-friendly code.

Tip #1: Use the Right Data Size
The semantics of C state that all calculations should have the sameresult as if all operands were cast to int and the operation performedon int (or unsigned int or long int if values cannot fit in an int).

If the result is to be stored in a variable of a smaller type likechar, the result is then (conceptually at least) cast down. On anydecent 8-bit micro compiler, this process is short-circuited whereappropriate, and the entire expression calculated using char or short.

Thus, the size of a data item to be processed should be appropriatefor the CPU used. If an unnatural size is chosen, the code generatedmight get much worse. For example, on an 8-bit micro, accessing andcalculating 8-bit data is very efficient.

Working with 32-bit values will generate much bigger code and runmore slowly, and should only be considered when the data beingmanipulated need all 32 bits. Using big values also increases thedemand for registers for register allocation, since a 32-bit value willrequire four 8-bit registers to be stored.

On a 32-bit processor, working with smaller data might beinefficient, since the registers are 32 bits. The results ofcalculations will need to be cast down if the storing variable type issmaller than 32 bits, which introduces shift, mask, and sign-extendoperations in the code (depending on how smaller types arerepresented).

On such machines, 32-bit integers should be used for as manyvariables as possible. chars and shorts should only be used when theprecise number of bits are needed (like when doing I/O), or when bigtypes would use too much memory (for example, an array with a largenumber of elements).

Tip #2: Use the Best Pointer Types
A typical embedded micro has several different pointer types, allowingaccess to memory in a variety of ways, from small zero-page pointers tosoftware-emulated generic pointers. It is obvious that using smallerpointer types is better than using larger pointer types, since both thedata space required to store them and the manipulating code is smallerfor smaller pointers.

However, there may be several pointers of the same size but withdifferent properties, for example two banked 24-bit pointers huge andfar, with the sole difference that huge allows objects to cross bankboundaries. This difference makes the code to manipulate huge pointersmuch bigger, since each increment or decrement must check for a bankboundary. Unless you really require very large objects, using thesmaller pointer variant will save a lot of code space.

For machines with many disjoint memory spaces (like Microchip PICand Intel 8051), there might be “generic” pointers that can point toall memory spaces. These pointers might be tempting to use, since theyare very convenient, but they carry a cost in that special code isneeded at each pointer access to check which memory a pointer points toand performing appropriate actions.

Also note that using generic pointers typically brings in somelibrary functions. In summary: use the smallest pointers you can, andavoid any form of generic pointers unless necessary. Remember to checkthe compiler default pointer type (used for unqualified pointers, anddetermined by the data memory model used). In many cases it is a ratherlarge pointer type.

Tip #3: Structures and Padding
A C struct is guaranteed to be laid out in memory with the fields inthe order of the declaration. However, on processors with alignmentrestriction on loads and stores, the compiler will probably insertpadding between structure members, in order to align each memberefficiently. This will make the struct larger than the sum of the sizesof the types of the members, and could break code written under theassumption that structs are laid out contiguously in memory.

Alignment requirements are rare on 8- and 16-bit CPUs, but quitecommon on 32-bit CPUs. Some CPUs (like the Motorola ColdFire and NECV850) will generate errors for misaligned loads, while other will onlylose performance (Intel x86).

Padding will be inserted at the end of a structure if necessary, toalign the size of the structure with the biggest alignment requirementof the machine (typically 4 bytes for a 32-bit machine). This isbecause every element in an array of structures must start at analigned boundary in memory.

The sizeof() operator will reveal the total size of a struct,including padding at the end. Incrementing a pointer to a structurewill move the pointer sizeof() bytes forward in memory, thus reflectingend padding. When a struct contains another struct, the padding of themember structure is maintained.

In some cases, the compiler offers the ability to pack structures inmemory (by #pragma, special keywords, or command-line options),removing the padding. This will save data space, but might cost codesize, since the code to load misaligned members is potentially muchbigger and more complex than the code required to load aligned members.

To make better use of memory, sort the members of the struct inorder of decreasing size: 32-bit values first, then 16-bit values, and,finally, 8-bit values. This will make internal padding unnecessary,since each member will be naturally aligned (there will still bepadding at end of the struct if the size of the struct is not an evenmultiple of the machine word size).

Note that the compiler's padding can break code that uses structs todecode information received over a network or to address memory-mappedI/O areas. This is especially dangerous when code is ported from anarchitecture without alignment requirements to one with them.

Tip # 4: Use Function Prototypes
Function prototypes were introduced in ANSI C as a way to improve typechecking. The old style of calling functions without first declaringthem was considered unsafe, and is also a hindrance to efficientfunction calls.

If a function is not properly prototyped, the compiler has to fallback on the language rules dictating that all arguments should bepromoted to int (or double, for floating-point arguments). This meansthat the function call will be much less efficient, since type castswill have to be inserted to convert the arguments.

For a desktop machine, the effect is not very noticeable (mostthings are the size of int or double already), but for small embeddedsystems, the effect is potentially great.

Problems include ruining register parameter passing (larger valuesuse more registers) and lots of unnecessary type conversion code. Inmany cases, the compiler will give you a warning when a functionwithout a prototype is called. Make sure that no such warnings arepresent when you compile!

The old way to declare a function before calling it (Kernighan &Ritchie or “K&R” style) was to leave the parameter list empty, like”extern void foo().” This is not a proper ANSI prototype and will nothelp code generation. Unfortunately, few compilers warn about this bydefault.

The register to parameter assignment for a function can always beinferred from the type of the function, i.e., the complete list ofparameter types (as given in the prototype). This means that all callsto a function will use the same registers to store parameters, which isnecessary in order to generate correct code. The code in a functiondoes not in any way affect the assignment of registers to parameters.

Tip # 5: Use Parameters
As discussed above, register allocation has a hard time with globalvariables. If you want to improve register allocation, use parametersto pass information to a called function and not shared globalvariables. Parameters will often be allocated to registers both in thecalling and called function, leading to very efficient calls.

Note that the calling conventions of some architectures andcompilers limit the number of available registers for parameters, whichmakes it a good idea to keep the number of parameters down for codethat needs to be portable and efficient across a wide range ofplatforms. It might pay off to split a very complex function intoseveral smaller ones, or to reconsider the data being passed into afunction.

Tip #6: Do Not Take Addresses
If you take the address of a local variable (the “&var”construction), it is not likely to be allocated to a register, since ithas to have an address and, thus, a place in memory (usually on thestack).

It also has to be written back to memory before each function call,just like a global variable, since some other function might havegotten hold of the address and is expecting the latest value. Takingthe address of a global variable does not hurt as much, since they haveto have a memory address anyway.

Thus, you should only take the address of a local variable if youreally must (it is very seldom necessary). If the taking of addressesis used to receive return values from called functions [from scanf(),for example], introduce a temporary variable to receive the result, andthen copy the value from the temporary to the real variable.

This should allow the real variable to be register allocated. (Notethat in C++, reference parameters [“foo(int &)”] can introducepointers to variables in a calling function without the syntax of thecall showing that the address of a variable is taken. )

Making a global variable static is a good idea (unless it isreferred to in another file), since this allows the compiler to knowall places where the address is taken, potentially leading to bettercode.

An example of when not to use the address-of operator is thefollowing, where the use of addresses to access the high byte of avariable will force the variable to the stack. The good way is to useshifts to access parts of values.

Tip #7: Do Not Use Inline AssemblyLanguage
Using inline assembly is a very efficient way of hampering thecompiler's optimizer. Since there is a block of code that the compilerknows nothing about, it cannot optimize across that block. In manycases, variables will be forced to memory and most optimizations turnedoff.

The output of a function containing inline assembly should beinspected after each compilation run to make sure that the assemblycode still works as intended. In addition, the portability of inlineassembly is very poor, both across machines (obviously) and acrossdifferent compilers for the same target.

If you need to use assembler, the best solution is to split it outinto assembly source files, or at least into functions containing onlyinline assembly. Do not mix C code and assembly code in the samefunction!

Tip # 8: Do Not Write Clever Code
Some C programmers believe that writing fewer source code charactersand making clever use of C constructions will make the code smaller orfaster. The result is code that is harder to read, and that is alsoharder to compile.

Writing things in a straightforward way helps both humans andcompilers understand your code, giving you better results. For example,conditional expressions gain from being clearly expressed asconditions.

Consider the two ways to set the lowest bit of variable b if thelower 21 bits of another (32 bit) variable are nonzero as illustratedbelow. The clever code uses the ! operator in C, which returns zero ifthe argument is nonzero (“true” in C is any value except zero), and oneif the argument is zero.

The straightforward solution is easy to compile into a conditionalfollowed by a set bit instruction, since the bit-setting operation isobvious and the masking is likely to be more efficient than the shift.Ideally, the two solutions should generate the same code. The clevercode, however, may result in more code since it performs two !operations, each of which may be compiled into a conditional.

Another example is the use of conditional values in calculations.The “clever” code will result in larger machine code, since thegenerated code will contain the same test as the straightforward code,and adds a temporary variable to hold the one or zero to add to str.The straightforward code can use a simple increment operation ratherthan a full addition, and does not require the generation of anyintermediate results.

Since clever code almost never compiles better than straightforwardcode, why write clever code? From a maintenance standpoint, writingsimpler and more understandable code is definitely the method ofchoice.

Tip #9: Use Switch for Jump Tables
If you want a jump table, see if you can use a switch statement toachieve the same effect. It is quite likely that the compiler willgenerate better and smaller code for the switch rather than a series ofindirect function calls through a table.

Also, using the switch makes the program flow explicit, helping thecompiler optimize the surrounding code better. It is very likely thatthe compiler will generate a jump table, at least for a small denseswitch (where all or most values are used).

Using a switch is also more reliable across machines; the layoutthat may be optimal on one CPU may not be optimal on another, but thecompiler for each will know how to make the best possible jump tablefor both. The switch statement was put into the C language tofacilitate multiway jumps: use it!

Tip #10: Investigate Bit FieldsBefore Using Them
Bit fields offer a very readable way to address small groups of bits asintegers, but the bit layout is implementation defined, which createsproblems for portable code.

The code generated for bit fields will be of very varying quality,since not all compilers consider them very important. Some compilerswill generate incredibly poor code since they do not consider themworth optimizing, while others will optimize the operations so that thecode is as efficient as manual masking and shifting.

The advice is to test a few bit field variables and check that thebit layout is as expected, and that the operations are efficientlyimplemented. If several compilers are being used, check that they allhave the same bit layout. In general, using explicit masks and shiftswill generate more reliable code across more targets and compilers.

Tip #11: Watch Out for LibraryFunctions
As discussed above, the linker has to bring in all library functionsused by a program with the program. This is obvious for C standardlibrary functions like printf() and strcat(), but there are also largeparts of the library that are brought in implicitly when certain typesof arithmetic are needed, most notably floating point.

Due to the way in which C performs implicit type conversions insideexpressions, it is quite easy to inadvertently bring in floating point,even if no floating point variables are being used. For example, thefollowing code will bring in floating point, since the ImportantRatioconstant is of floating point type—even if its value would be1.95*20==39, and all variables are integers:

If a small change to a program causes a big change in program size,look at the library functions included after linking. Especiallyfloating point and 32-bit integer libraries can be insidious, and creepin due to C implicit casts.

Another way to shrink the code of your program is to use limitedversions of standard functions. For instance, the standard printf() isa very big function. Unless you really need the full functionality, youshould use a limited version that only handles basic formatting orignores floating point.

Note that this should be done at link time: the source code is thesame, but a simpler version is linked. Because the first argument toprintf() is a string, and can be provided as a variable, it is notpossible for the compiler to automatically figure out which parts ofthe function your program needs.

Tip #12: Use Extra Hints
Some compilers allow the programmer to specify useful information thatthe compiler cannot deduce itself to help optimize the code. Forexample, DSP compilers often allow users to specify that two pointer orarray arguments are unaliased, which helps the compiler optimize codeaccessing these two arrays simultaneously.

Other examples are the specification of functions as pure (without side effects) or tasks (will loop forever, thusno need to save registers on entry). A common example is inline ,which might be considered a hint or an order by the compiler. Thisinformation is usually introduced using nonportable keywords and shouldbe put in tuned header files (if possible). It might give greatbenefits in code efficiency, however.

Final Notes
Part 5 in this series and theabove tips have tried to give you an idea of how a modern C compilerworks, and to give you some concrete hints on how you can get smallercode by using the compiler wisely.

A compiler is a very complex system with highly nonlinear behavior,where a seemingly small change in the source code can have big effectson the assembly code generated. The basis for the compilation processis that the compiler should be able to understand what your code issupposed to do, in order to perform the operations in the best possibleway for a given target.

As a general rule, code that is easy to understand for a fellowhuman programmer—and thus easy to maintain and port—is also easier tocompile efficiently. Note that unless you let your compiler use itshigher optimization levels, you have wasted a lot of your investment.

What you pay for when you buy a compiler is mostly the work put intodeveloping and tuning optimizations for a certain target, and if you donot use these optimizations, you are not using your compiler to itsbest effect.

Choose your compiler wisely: different compilers for the same chipcan be very different. Some are better at generating fast code, otherat generating small code, and some may be no good at all. To evaluate acompiler, the best way is to use a demo version to compile smallportions of your own “typical” code.

Some chip vendors also provide benchmark tests of various compilersfor their chips, usually targeted toward the intended application areafor their chips. The compiler vendor's own benchmarks should be takenwith some skepticism, it is (almost) always possible to find a programwhere a certain compiler performs better than the competition.

For more tips on efficient C programming for embedded systems, youshould check out the classes presented at embedded systems trade shows(around the world). The web sites of the companies making compilersoften contain some technical notes or white papers on particular tricksfor their compilers (but make sure to watch out for those that are puremarketing material!).

To read Part 1, go to Reentrancy, atomic variables and recursion.
To read Part 2, go to Asynchronous Hardware/Firmware
To read Part 3, go to Metastable States
To read Part 4, go to Dealing With Interrupt Latency
To read Part 5, go to Using your C-compiler to minimize code size

JakobEngblom ( technical marketing manager atat Virtutech.He has a MSc in computer science and a PhD in Computer Systems fromUppsala University, and hasworked with programming tools and simulation tools for embedded andreal-time systems since 1997. 
He was a contributor of material to “ The Firmware Handbook,” editedby Jack Ganssle, upon which this series of articles was based andprintedwith permission from Newnes, a division of Elsevier.Copyright 2008.  Forother publications by Jakob Engblom, see

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.