Seventeen steps to safer C code

17 tips for writing safety-critical C code using methods adapted from C++ and Ada.

Click image to go to digital edition.

In embedded systems design, many of us tend to write our software in C the way our “grandfathers” did, which was appropriate before we had to worry about ubiquitous connectivity and its security implications. Today, the programming methods of the past must be adapted to a world in which safety-critical design is required not only in military/aerospace applications but in ordinary commercial applications as well. The C language is definitely not type safe, and only by applying many good practices and self-imposed rules can it be made a viable choice for safety-critical software development.

I learned these rules and best practices working for companies that were moving to programming paradigms more amenable to safety-critical software development. For example, at one company we were developing Internet banking and chip-card terminal applications, using C++ on Windows PCs. At the time, we believed we were doing object-oriented programming, but I now believe that what we were writing was really C with C++ syntax. Having seen it often enough, I theorize that embedded systems developers naturally fall into this type of hybrid coding when migrating from procedural C programming to the object-oriented C++ paradigm.

Later I moved on to what was, at the time, a new domain of software engineering: safety-critical embedded systems design. My first project required me to learn Ada. At the end of the project, I understood that new hardware in the embedded systems area also means new software and firmware, and I learned from Ada what type safety really means.

What is type safety
I generally define type safety as Wikipedia defines it: “In computer science, type safety is the extent to which a programming language discourages or prevents type errors. A type error is erroneous or undesirable program behaviour caused by a discrepancy between differing data types.”* If you define a type, this means nothing more than saying there are a certain number of bits that represent a predefined data type. For instance, uint32_t number_of_bytes defines 32 bits, which hold an unsigned scalar value ranging from 0 to 4.294.967.295. If you would assign a negative value to the number_of_bytes variable, a type-safe language would raise an exception during run time or a compile-time error at compile time. The Ada language does this, but C does not. In C, you could also assign a floating-point value like 3.456 to the variable, which makes some compilers complain at compile time and produces undefined behavior during run time.

*Wikipedia entry on type safety, from April 20, 2011. http://en.wikipedia.org/wiki/Type_safety

From this and other experiences, I've come up with a set of 17 tips summarizing the lessons I've learned as a software engineer in the embedded systems environment, particularly as they relate to C programming, as a way to help others avoid the same potholes I encountered. In the process, I had a lot of help, particularly with the many tips on safe C in articles at EmbeddedGurus.com and Embedded.com .

Tip #1—Follow the rules you've read a hundred times

There are three things you must do each time you start writing your code. You've read these rules many times before and resolved to do them the next time you started code development. This time, do them—they will help you to avoid many long hours of debugging:

  • Initialize variables before use.
  • Do not ignore compiler warnings.
  • Check return values.

Accessing objects before they have a defined state can lead to strange effects. Not only does avoiding these effects require that you make sure you've set all your ints and floats to a defined state, you also have to make sure that your complex type functions, such as typedefed structs , are initialized first.

For instance, declare an object like the one in Listing 1 in the header. In Listing 1 , the object has the typical init flag and two function pointers for reading and updating data.


Click on image to enlarge.

Then, in the C module, initialize the object to a level for first usage. In this case, the init flag was set to false. The variable is static here, since I have only one instance of it:

static symbol_model_t g_symbol_model ={  false,/* is initialized */  NULL, /* update function */  NULL  /* read function */};   

Later on during construction of the object, you can check if things were not initialized, such as shown in Listing 2 .


Click on image to enlarge.

Even though they finally compile the code, modern compilers are always complaining about strange constructs. Do not ignore these complaints. More often than not, they are right. Also do not ignore return values since they indicate the first time something has gone wrong. If you ignore such warnings, you'll have a ticking time bomb in your system that will explode at a later point.

If you follow these procedures, you'll have more time left at the end of the project. After all, the end of the project is the point at which money and time are running out and people are overstressed, especially if the product isn't working and is shipping late. Now you'll have time to help them set things right.

Tip #2—Use enums as error types

Every module should have a specific error return type that explains what the problem is in detail, at the time it occurs. Often you receive error codes like “-1″ or “an error occurred.” If there is a run-time error detected and you know exactly what it is, document this for your later reference and for those who maintain the software.

For example, consider the code in Listing 3 .


Click on image to enlarge.

Such an error type already has been decoded and the cause determined. The last entry, called _LAST_ERROR , makes it possible to iterate over the content of the enum . That means you only have to know what the first element in the chain is. No matter how many more errors you add between the first and the last, all you have to do is check the range or iterate. Also, this last enum value gives you the total number of entries. More on this later.

Tip #3—Expect to fail
Failures happen. Often. So plan for it and use it to your advantage. It's good practice to set the default return value of an operation to something like UNNOWN_ERROR . Only in the case of a good result should you set it to SUCCESS , for instance, as in Listing 4 .


Click on image to enlarge.

This pessimistic approach is safer than expecting all things to go well and setting the default to _SUCCESS . In programming, it's safer to assume that the failure is not the exception in a string of successes but exactly the opposite: The good case is the only exception in a string of more commonly occurring error cases.

Tip #4—Check input values: never trust a stranger

If your modules expect input data from other modules, you should never trust a stranger. That is, at the outmost layer of your software architecture, check all input values for consistency. The check has to be at the outmost layer since it must be detected as soon as possible. Otherwise you could, for instance, dereference an invalid pointer given to you at one of your lower layers. The result: The crash dump reports that it was your software's problem, but later, after many hours of debugging, you find out that someone has given you invalid input.

Here is an example using an enum error type mapped to a string representation for trace output to a console window, shown in Listing 5 .


Click on image to enlarge.

The lookup table representing the strings is defined as shown in Listing 6 . Safe access to the string map that does not allow any out of bounds access is shown in Listing 7 .


Click on image to enlarge.


Click on image to enlarge.

Unfortunately enums in C are integers. That means you could hand over any value of integer to the interface accessing the array, an error that can be avoided.

By the way, if you define the lookup table with the _LAST enum as a size parameter, it will have the right size and keep you from indexing out of bounds. Also getting the string out of the string array is a very simple offset addressing operation, which is really fast in C.

So, that was range checking. You should also check for NULL pointers if someone gives you an address value. You cannot check pointers for anything other than the NULL value, but this is better than nothing.

Tip #5—Write once, read many times
When we read other people's code, we're thankful for any good line of comment or more readable code; most of the time, however, the original coder hasn't been so kind to us. If the variables are called i, j, and k, you'll soon have a mental break down. Often the longest variable name is pbuf. What can happen when code is difficult to decipher is that even though the next programmer should only slightly change the software, he or she says, “I can't understand this hacker's code. It will be faster to rewrite it.” The rewrite results in extra work and possibly new bugs.

So what can you do? First of all, if you write code, write it to be as readable as a newspaper. Well-written code requires only a few lines of comments. Also consider that although code is nothing for compilers, it needs to be readable by human beings.

Don't be lazy at typing new variable names and, if required, add the unit to the name. For example, do not call parameters Size , Length , Temperature , or Angle . Instead, since all those parameters have a unit, call them:

  • number_of_bytes
  • length_in_meters
  • temperature_in_celsius
  • angle_in_radians

There are famous examples of errors coming from wrong unit conversions, such as the loss of a Mars climate orbiter (see http://mars.jpl.nasa.gov/msp98/news/mco990930.html ). Not using units in your application programming interface's definitions can also cause major design failures. See How To Design A Good API and Why it Matters (Joshua Bloch's Google TechTalks video from 2007) at www.youtube.com/watch?v=aAb7hSCtvGw .

If you've written code that requires some renaming, I recommend the use of the open-source Eclipse Development Environment (www.eclipse.org). It has a great feature called Refactoring that renames any kind of object everywhere in the code. For instance if you want to change a function parameter's name from number_of_bytes to number_of_floats , just mark it, press ALT-SHIFT-R, and change the name.

Documenting the source code is helpful not only for your future reference but for those who come after you. For instance, if you're working on an embedded system, you need to have a memory map indicating where all the memory-mapped devices can be found. Listing 8 shows an example of a memory map.


Click on image to enlarge.

It's useful to have diagrams of all the software layers in your application as well as diagrams of the overall software architecture, preparing them in a format that allows you to simply cut and paste them to a word processing program. Remember that if you write it down, you don't have to keep it in mind.

Tip #6—When in doubt, leave it out

I've already mentioned the API design tutorial from Joshua Bloch, a guru in the Java community. He brings up a good point in his API-design tutorial on YouTube (URL mentioned earlier) .

And that is: If you design an API that is nothing other than the external interface of your modules, consider the need of an operation. If you are not sure anyone will ever need an operation, leave it out. If someone does use your API and you later remove an operation, you'll break his code. So Josh says, “When in doubt, leave it out. You can always add, but you can never remove.”Tip #7—Use the right tools
Everyone has a favorite editor, debugger, and compiler. But sometimes it's worth looking for something new since “the better is the enemy of the good.” Here is what I use (many of which you may already use):

  • Eclipse: Has a good editor, is good at refactoring code, also good for prototyping architectures on the PC (for example, with Cygwin on Windows). www.eclipse.org.
  • Astyle: Artistic Style 2.01 is a great code formatter that can be configured in many ways to beautify the code. http://astyle.sourceforge.net/.
  • Cygwin: For PC-based prototypes and for architectural studies, you can use the GNU tool chain of cygwin. Make sure you install the make, binutils, and gcc from the development package. www.cygwin.com/.
  • GNU tool suite: Many embedded systems tool chains use this set of tools. Even if you don't have hardware at the beginning of the project (your hardware developers may not have finished their work), you can start writing prototypes for your architecture. Eclipse together with Cygwin using the GNU tools is worth trying. www.gnu.org.
  • Tortoise SVN: This is a nice add-on for Windows Explorer to access the subversion versioning system. http://tortoisesvn.tigris.org/.

All these software packages are available for free.

Tip #8—Define the software requirements first
Defining the requirements for the software you write is the first step for a successful product. I mean the software requirements for the final product, not those for the quick hacked throwaway prototype you're working on as a first step to the final product. And this, of course, requires defining the goal to be reached.

If you don't define the requirements, you can't test your final software properly—you'll have nothing to use to define a useful test case. In other words, how can you determine if you've finished the development? Here's a helpful multiple-choice software quiz along these lines:

How can you determine if you have finished the development?

  1. There is no more money left;
  2. There is no more time left; or
  3. All the requirements are implemented and tested successfully.

To properly finish development (you should all know which answer is the correct one above), I define the following on every project:

  1. Requirements for the OK case—that is, what is required to fulfill the main functionality.
  2. Requirements for the ERROR case, important for the safety-critical design since you also need to define what has to be done if things go wrong. Remember that the good case is often the exception in a string of more commonly occurring error cases.
  3. Tests that check if the above defined requirements are implemented correctly.

If you do testing on the code base directly (white-box testing), you may tend to test what the code does and not what it's supposed to do. Here's where requirement-based testing can improve your end product: You're forced to do black-box testing.

Tip #9—During boot phase, dump all available versions
If you're the one who implements the boot loader on new hardware, you would normally do the following:

  • Initialize the hardware according to the required memory map.
  • Execute a hardware self test.
  • Start booting the application.

Nothing new here: That is what your PC typically does every time you boot up.

But in embedded systems development in the era of FPGAs and CPLDs, the hardware is as modifiable and subject to change as the software. Dump all programmable logic devices version registers onto a console window or file before starting the application. This step is important since hardware developers nowadays use programmable devices to quickly change the behavior of their hardware, with the result that VHDL code can be changed as fast as software code can be changed.

To circumvent such messages as “this error is only on your system” or “we cannot reproduce the problem,” you should dump at least all the version registers of the hardware devices. Also your software should say what version it is. For a real product, there must be a matrix telling you which hardware works with what software version.

If you do this, you'll find out that often people are operating illegal combinations that can cause some super-strange effects. Such versioning information is very helpful for your product hotline or test staff, as well as to production people who can use it to check if what they've produced is the right configuration.

Tip #10—Use a software version string for every release
If you've finished development of a particular stage in a project in order to do tests on it or to release a software version, be sure you take the following steps in exactly the order written:

  1. Update the version string and date.
  2. Check the software version in to your versioning system.
  3. Update the version string right after check-in for the next version.
  4. Test the software.
  5. Fix the bugs.
  6. Continue developing the next version.

The most important step is 3.

I know of several instances where developers have given their software to testers or customers without incrementing the software version string. The result: Several software versions were out there with the same version string. It can take days before you realize this inconsistency and resolve it. (By the way, in the era of programmable devices, this also applies to the hardware engineers. So, if you meet some of them in the breakroom, remind them).

Tip #11—Design for reuse: use standards

Don't try to reinvent the wheel, believing your wheel will be better than all the millions that have already been invented. I've read so many times things like Listing 9 .


Click on image to enlarge.

Since the C99 language standard has defined the stdbool.h and stdint.h headers, things have become portable and there is absolutely no need to define your own int or boolean types .

Tip #12—Expose only what is needed
When I read other programmers' code, I wonder if they've ever heard about “information hiding.” I find many externally declared variables that can be accessed from several modules. The practice is both pointless and sometimes dangerous.

Module internal operations and variables are often not declared static, which allows them to be accessible from other modules. This accessibility results in a design that is not modular because when operations and variables are not declared static, they're interdependent and not modular (since one thing cannot live without the other).

Also C doesn't have any syntax for anything like namespaces, common in other object-oriented languages. Or to be more precise, C knows only one, the global namespace. This means that all nonstatic operations or variables are visible globally unless you hide them. This global visibility could result—and often does—in a name clash detected at linking the software. As long as you have all the source code for the project, you can easily resolve this issue. If you have only a library in binary format and some function headers, the situation is more complicated.

Another related topic: Parameters have to be declared as const if the implementer of the interface doesn't want this object to be changed. The difference between C and C++ is that in C++ const means constant, whereas C defines constant to be interpreted as read-only. To illustrate this, Listing 10 shows a declaration of a read operation reading data into a specified buffer called display_data , which is at a constant address.

A write operation that is creating constant data located at a constant buffer address requires const two times, shown in Listing 11 .


Click on image to enlarge.

If you later try to modify constant objects, your compiler will correct you.

You should let your compiler help you develop your software in a safe way. What you need to do is to provide the compiler the information on how the objects are to be treated.

Tip #13—Make sure you've used “volatile” correctly
In embedded software development you sometimes have to do things that your host-based colleagues are often not concerned about. One of those things is declaring variables to be volatile, which keeps the compiler from optimizing read or write operations for this variable. How to do this is well described by Michael Barr's blog posting “Firmware-Specific Bug #3: Missing Volatile Keyword” found at http://embeddedgurus.com/barr-code/2010/02/firmware-specific-bug-3-missing-volatile-keyword/

Tip #14—Don't start with optimization as the goal
Some developers are intent on writing “fast code,” even though they cannot define what “fast” means in the context of their application. It sounds good as an objective, but what I've seen is that often under the cover of writing fast code, they want to move beyond the existing system definition and move to a nonexisting architecture.

To account for this and other system-redefining goals, you should think seriously about developing a flexible architecture capable of adapting to various exigencies. First, this means developing a set of software requirements for the product being planned. Then you should assume that once developed, your software architecture will no doubt be extended, so consider how you can design it to be flexible enough to incorporate new features into the existing platform without scrapping the code base you've already developed.

If you're concerned about performance, wait until your project's first integration phase, at which point you can determine how fast the system is. This doesn't have to be the last milestone in the project. As proof of concept, you can plan to produce several interim “proof of concept” implementations and measure performance. Working from that known value, consider what you need to do to achieve the necessary performance goals.

If you then detect that your code is not fast enough, you have to check which parts are responsible for the main time consumption. Then you profile the software and determine what loops and routines are consuming all the time.

The rule with optimization is that you first have to know where you are before considering what you need to optimize in order to get where you want to be. And don't forget that the software must still be maintainable.

Tip #15—Don't write complex code
Complex code is error-prone code. I think most of you already know this. But the question is, what does complex really mean in the context of your particular design?

A good metric for defining this is the McCabe or cyclomatic complexity algorithm. See also this well-written article by Jack Ganssle (“Taming software complexity,” Embedded.com, 2008) at www.eetimes.com/4007519 .

My opinion is that if you write safety-critical code, the best rules are:

•    Max. cyclomatic complexity per function: 10
•    In a few exceptions, such as use of switch-case constructs: 15

Many good tools exist for measuring the complexity of your code. “Understand for C,” at www.scitools.com is a good tool, but many others are available.

You shouldn't just think about writing code during the initial software development phase. You need to think about all the stages of the code's life. The code of a product will be changed and extended many times during the product's life cycle. And the people who have to do this are pretty often not the ones who have written the code initially. In other words, do not just think about your own needs: many others are coming after you.

Tip #16—Use a static code checker
If you write safety-critical code, you surely have a coding guideline. Even if your guideline contains only 10 rules, you must have a tool to help you check those rules. If your team doesn't have a tool for checking, you can be sure that things won't be checked.

Many tools, such as PC-Lint, are available to accomplish this task. You should check your code at every significant milestone in your project to be sure that the code quality is good.

Remember, software testing is a multistage process, of which static code checking is one part. The other stages include:

•    Functional tests.
•    Requirements-based tests.
•    Coverage tests (such as MC/DC).

All these tests have one general purpose: to reduce the number of bugs in your code.

No software code base exists on this planet that can be considered to be error-free. But there are many good software code-base products that do what they're supposed to do. Unfortunately, there are also many that do not.

Tip #17—Myths and sagas

Many myths and sagas persist in the world of safety-critical embedded systems. One of the most common is that dynamic memory allocation is forbidden. This myth, however, is only half of the truth. Every application has an initialization phase. This phase is followed by the operational one. It's no risk at all to do dynamic memory allocation during the initialization phase. But to avoid memory fragmentation, during the operational phase, you aren't allowed to change those allocations.

Avoid the potholes
With this article, I've identified some potholes on the road to safety-critical software development and how you can avoid them. When you come to the part of your job where you tell a computer what you want it to do, I hope these tips will be helpful. At that time, remember this one sentence summary I came across once about our common job domain: “Computers have the strange habit of doing what you say, not what you mean.” 

Thomas Honold is a software architecture designer, specializing in safety-critical DO-178B software development in the defense/aerospace industry. He has a master in electronic engineering and has worked 15 years on software architectures and design for banking software, Internet banking, chip-card readers, avionics, and bootloader driver software.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.