The replacement of a function call by a copy of the code of the function can be an effective optimization, particularly if execution speed is the priority. This article takes a look at how inlining works, when it can be effective, and how it may happen automatically in C and C++.
To the first approximation, all desktop computers are the same. It is straightforward to write acceptable applications that will run on anyone’s machine. Also, the broad expectations of users are the same. But embedded systems are all different – the hardware and software environment varies widely and the expectations of users are just as diverse. In many ways, this is what is particularly interesting about embedded software development.
An embedded compiler is likely to have a great many options to control optimization. Sometimes that fine-grain control is vital; on other occasions, it can come down to a simple choice between optimization for speed or size. This choice is curious, but it is simply an empirical observation that small code is often slower and fast code tends to need more memory.
An obvious example is function inlining. A small function can be optimized so that its actual code is placed in line at each call site. This executes faster because the call/return sequence is eliminated. Also stack usage may be reduced. But this method has the potential to use more memory, as there may be multiple copies of identical code. Sometimes you can get lucky and an optimization which yields faster code is also light on memory, but this is quite unusual.
It is reasonable to expect a good C or C++ compiler, when told to compile for speed, to perform inlining automatically. There are two situations when this may occur: small functions and static functions.
Small Functions If a function is small, the size overhead in replacing calls with code may be very small and the speed benefit useful. In some cases, the code may be as small as, or smaller than, a call/return sequence, so there is a win-win – smaller and faster code.
Static functions If a function is declared as static, the compiler knows that the function cannot be used outside of the module in hand, so it can make some smart decisions. Specifically, if there is only a single call to the function, the code should be inlined automatically without further ado, as the result will always be faster and smaller – a win-win situation. It may be argued that doing this automatically is wrong, because the programmer explicitly specified a call. However, I would counter with the assertion that a compiler's job is not to convert code in C to assembly language; it is required to translate an algorithm, expressed in C, to assembly language code with identical functionality.
Although modern C compilers support inlining, it initially appeared as a language feature in C++. There are two ways to tell the compiler that inlining a function might be desirable: through inline keywords and Class Member Functions.
Inline keyword C++ has a number of keywords additional to those in C. Among those is 'inline'. This is a directive to the compiler requesting that the function be inlined. The compiler is at liberty to ignore the request if the function is too big or inlining conflicts with optimization settings (i.e., if switches request small code instead of fast code). The inline keyword is also implemented in many modern C compilers as a language extension, and works in the same way.
Although this keyword seems straightforward, it has a hidden danger – it may result in excessive memory usage:
- If multiple calls are made to an inlined function, there will be multiple copies of the code. Depending on the size of the function and the number of calls, this can add up. However, this should not be a surprise.
- If a function is declared inline, but not also static, the compiler can inline the function in the current module but needs to generate an out-of-line copy too, in case it is called from another function. In this case, it would be down to the linker to sort out the problem. All that is necessary is for the linker to be able to detect “orphan” functions – i.e., functions that are not called from anywhere – and there are tools that do just that.
C++ class member functions
In C++, a class (and a structure, actually) may include functions as well as code. There are two ways to define member functions. The code may be included in the class definition or the member function may be simply declared and defined outside, thus:
Here the function foo() is defined in each of the possible ways. There is an interesting difference between these two techniques: a definition within the class intrinsically advises the compiler to inline the function; the external definition would need an inline modifier to achieve the same result. In this example T1::foo() is inlined (or might be); T2::foo() is not (or probably will not be). This notation is quite natural, as it would seem logical to include the code of only small functions within the class definition.
The concept of inlining a function seems straightforward and, at the conceptual level, it is just that. However, choosing to use the optimization requires a little care, and the way that embedded software development tools – primarily the compiler and linker – handle the result of using the inline keyword should be carefully verified.
Colin Walls has over thirty years experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, Colin is an embedded software technologist with the Mentor Graphics Embedded Software Division, and is based in the UK. His regular blog is located at: mentor.com/colinwalls .You can also follow Colin on Google+.