Useful language extensions for embedded applications

Most people would agree that standardizing programming languages is a good thing, as it leads to portability of software and programming skills. But what if the resulting standard lacks features required for certain types of applications?

One possible strategy is to define a language specification that is so large that it encompasses every possible eventuality. In the late 1960s, IBM created the “ultimate” programming language, which had capabilities that accommodated numerous types of applications – it was called PL/1.

Although it was widely used on IBM mainframes for some years, it had a fundamental problem: because the language was so large, each programmer would tend to learn and use just a sub-set of the available facilities, so they would not necessarily be able to understand one another’s code. In effect, each group of programmers was using a different language, with just a shared syntax. This rather defeats the point of a standard. Years later, the Ada language was defined with the same aims and suffered the same problems, albeit to a lesser degree.

Another approach is to keep the specification quite lean, but complete enough for most applications, and accept that language extensions may be necessary for specific specialized needs.

There has never really been a programming language designed specifically for embedded programming that has found its way into widespread use. Attempts include PL/M and its variants, but that was only in the Intel world, and Forth, which, though once quite widely used, has a certain individuality that deters many users.

The most widely used languages for embedded applications are C and C++, neither of which was designed for embedded.

Adapting a language
The lack of a few key capabilities in C/C++ can be addressed in three ways:

  • inline compiler directives – #pragma
  • linker capabilities
  • language extensions – extra keywords

There are pros and cons of each of these options:

Although compiler directives can be flexible, they result in non-portable code, as they are specific to a particular compiler. This may not be an issue if you intend to continue to use the same range of compilers across a number of projects. Ideally, any #pragma statements should be grouped together and maybe stored in a header file to make porting the code easier.

This, however, is only possible for globally effective statements, which typically set compiler options/defaults, which are probably better done from the command line (or by using an IDE or make file). Statements that need to be located in specific places to affect the code are more problematic and should be avoided if possible.

Using the linker to address language shortcomings is a creative solution. Linkers are critical tools for embedded developers and typically provide a wide range of options to control the memory utilization of the code. This flexibility is valued by developers.

Adding a few specific keywords does not compromise code portability too much, as they are supported by a wide range of embedded compilers.

Adding Keywords
The three significant areas of functionality that need special attention for embedded applications are:

  • insertion of assembly language code – keyword is usually asm
  • interrupt service routines (ISRs) – keyword is usually interrupt
  • packing of data structures into memory – keywords are usually packed and unpacked

Assembly language inserts
There are certain processor functions – enabling and disabling interrupts, for example – which are not accommodated in C/C++. One way to access such functionality would be to write tiny assembly language routines and call then from C. The downside of doing this is a small overhead on both code size and speed of the call/return sequence.

A solution would be the ability to insert a small number of assembly language statements into the C code and this is what the asm keyword facilitates. This is actually a reserved word in C/C++, but its implementation, and hence the details of its syntax, are compiler dependent.

Sometimes inserting a few lines of assembly language will maximize performance. However, all use of the asm keyword should be reviewed carefully, because its use will affect code portability.rr

Interrupt service routines
Historically, it was common practice to write ISRs in assembly language, mainly to maximize execution performance. The quality of code generation from modern C/C++ compilers is quite adequate for these languages to be used to write ISRs. However, although an ISR looks quite like a C function (i.e. a subroutine), there are 3 differences:

  1. At the start of the ISR code the context needs to be saved; the data in any registers that will be used in the ISR needs to be preserved.
  2. At the end of the ISR the context needs to be restored.
  3. Normally, there is a special “return from interrupt” instruction at the end of an ISR instead of the usual “return from subroutine”.

All of these requirements could be accomplished (although with some difficulty) using assembly language inserts. However, many compilers provide an interrupt keyword, which is used as a qualifier to the function definition. The compiler then takes care of these matters.

Usually there is the requirement to set up an interrupt vector – a list of ISR addresses that correspond to each interrupt. This can normally be achieved by creating a C array of pointers to functions (the ISRs) and locating it suitably using the linker.

Packed data structures
As every embedded application isdifferent, each project may necessitate different priorities for codegeneration. This is why embedded compilers have very fine control overcode generation and optimization. The same control is needed to governdata storage and access.

Broadly speaking, data can be stored in two ways:

  1. It can be aligned to word boundaries, which facilitates fast access, but wastes memory with “padding” bytes being used – this is “unpacked” data.
  2. Data can be located in memory without regard to word boundaries; this uses memory space efficiently, but access code is typically bigger and slower.

Figure 1 illustrates the two options for a structure like this:

  struct
  {
    short big;
    char little;
  } stuff(4);

Figure 1: Packed and unpacked data

Thelayout of the unpacked data here assumes a processor that canefficiently access 8-bit data in a bytewise fashion. In the worst case,the little elements would also need to be word aligned, which would meanmany more wasted (padding) bytes.

Each of these options may be awise choice for a given embedded application and can normally beselected by means of a compiler option.

There are circumstanceswhen, regardless of whether packed or unpacked data has been chosen, aspecific data structure may need to be allocated to memory using theother method. This may be because, even though packed data was chosen,fast access to a specific data item is required. Or, perhaps unpackeddata was chosen for speed, but a particular data structure is very largeand needs to be compressed. Many compilers offer the packed andunpacked keywords for these eventualities and they override theprevailing option setting. These are used as qualifiers on variabledefinitions and declarations.

It is unlikely that both keywordswould be used in one piece of software, but there is no downside ofdoing so if the readability of the code is enhanced. In fact, if everydata structure were qualified using these keywords, the compiler optionmight be redundant.

Although language standardization isundoubtedly a good thing, the embedded software development community isa minority and no dedicated languages have seen deployment. As aresult, small extensions to existing languages such as C and C++ are alogical solution to producing readable, maintainable and portable code.

Colin Walls has over thirty years experience in the electronics industry, largelydedicated to embedded software. A frequent presenter at conferences andseminars and author of numerous technical articles and two books onembedded software, Colin is an embedded software technologist withMentor Embedded (the Mentor Graphics Embedded Software Division), and isbased in the UK. His regular blog is located at blogs.mentor.com/colinwalls . He may be reached by email at .

4 thoughts on “Useful language extensions for embedded applications

  1. Another language extension should be created to specify bit packing as in bit assigning lsb to msb or vica versa. Historically, all of these extensions were motivated by hardware architectural differences. Are new extensions needed for multicore?

    Log in to Reply
  2. Probably the most powerful mechanism to extend the C/C++ language is the preprocessor. If thoughtful standardized extensions were made to the preprocessor then language extensions could be “define”d using macros and compiled using C/C++. After all C defi

    Log in to Reply
  3. Gcc & clang/LLVM are open source and you can add whatever you need if you are keen – so no need to wait for someone else to come up with the standard. I went the whole hog and wrote myself a C++ parser so I could handle some parallel extensions –

    http://p

    Log in to Reply
  4. C++11 has “generalized attributes”, which (partially) fills the need for platform-specific keywords. Generalized attributes make encapsulation of platform-dependent details and optimizations is easier. They obey normal C++ scoping, uunlike pragmas. They do

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.