Using coding standards to improve software quality and security

Editor’s Note: In an excerpt from their book Embedded System Security , the authors assess the role of C and C++ coding standards and how compliance leads to more secure code.

Most safety and quality certification standards and guidance rules espouse the use of a coding standard that governs how developers write code. Some of them recommend or require that specific rules be included in the coding standard. The goal of the coding standard is to increase reliability by promulgating intelligent coding practices.

For example, a coding standard may contain rules that help developers avoid dangerous language constructs, limit complexity of functions, and use a consistent syntactical and commenting style. These rules can drastically reduce the occurrence of flaws, make software easier to test, and improve long term maintainability.

It is common for a coding standard to evolve and improve over time. For example, the development team may discover a new tool that can improve code reliability and recommend that management add a requirement that this tool be used during the development process.

It is also common to see a coding standard consisting of guidance rules whose enforcement is accomplished primarily with human code reviews. Developing a new coding standard with dozens of rules that must be verified manually is a sure way to reduce developer efficiency, even if it increases the reliability of the code.

Numerous static code analyzers, and some compilers, can automate large portions of a typical secure coding standard. Furthermore, although some coding standard rules are necessarily language-specific, there are some universally or almost universally applicable rules that should be a part of a high-quality coding standard. Assuming they are improving software quality, the best coding standard rules are those whose enforcement can be automated and are applicable to any software project.

Compilers and other tool chain components (e.g., the linker/loader) often emit warnings rather than halt a build with a fatal error. A warning is an indicator to the developer that a construct may be technically legal but questionable, such as exercising a corner of the language that is not well defined. Such constructs are frequently the cause of subtle bugs. To ensure that developers do not intentionally or accidentally ignore warnings, tell the compiler to treat all warnings as errors. Many compilers have such an option.

Compilers also tend to provide a variety of strictness levels in terms of language standard interpretation. Some compilers are capable of warning the developer about constructs that are technically legal but dangerous.

For example, the Motor Industry Software Reliability Association (MISRA) has published guidelines for the use of the C language in critical systems, and some compilers can optionally enforce some or all of these guidelines that essentially subset the language by excluding constructs believed to lead to unreliable software.

Some MISRA guidelines are advisory and may yield warnings instead of errors; once again, if the MISRA rule is enabled, the compiler should be forced to generate a fatal build error on any non-compliant construct.

The authors are not recommending that all development organizations adopt full MISRA compliance as part of their coding standards. On the contrary, there are good reasons for not adopting the entire standard. What we do recommend is that once management decides to enable a MISRA rule checker that will force product builds to fail on non-conformant source code constructs, the developers should immediately edit the code to fix the discovered issues.

This editing phase brings cost: time spent to change the code, retesting overhead, and risk of adding new flaws during the editing process. Therefore, management must be careful when adopting new coding rules. The following case study demonstrates this need.

Case Study: MISRA C:2004 and MISRA C++:2008
Like any language-related standard, MISRA has many good rules along with a few rules that are either questionable or simply inappropriate for some classes of users and applications.

MISRA 2004, with 141 rules, fixed a few questionable guidelines in the original MISRA1998 standard. If MISRA is used as part of a coding standard, it may be acceptable to enforce only a subset; however, that subset must be carefully considered and approved by management.

It is also important that the MISRA checker (often built directly into the compiler) be able to selectively enable and disable specific rules within individual code modules and functions.

The following is a sampling of some MISRA rules that demonstrate some of the pitfalls of the C programming language and how selective use of MISRA will help avoid them:

1 Rule 7.1: Octal constants (other than zero) and octal escape sequences shall not be used. The following example demonstrates the utility of this rule:

       a | = 256;
   b | = 128;
   c | = 064;

The first statement sets the eighth bit of the variable a . The second statement sets the seventh bit of variable b . However, the third statement does not set the sixth bit of variable c . Because the constant 064 begins with a 0, it is interpreted in the C standard as an octal value. Octal 64 is equal to 0x34 in hexadecimal; the statement thus sets the second, fourth, and fifth bits of variable c .

Because octal numbers range from zero to seven, developers easily misinterpret them as decimal numbers. MISRA avoids this problem by requiring all constants to be specified as decimal or hexadecimal numbers.

2 Rule 8.1: Functions shall have prototype declarations and the prototype shall be visible at both the function definition and call. The MISRA informative discussion for this rule includes the sound recommendation that function prototypes for external functions be declared in a header file and then included by all source files that contain either the function definition or one of its references.

It should be noted that a MISRA checker might only validate that some prototype declaration exists for calls to a function. The checker may be unable to validate that all references to a particular function are preceded by the same prototype. Mismatched prototypes can cause insidious bugs, which is worse than not having any prototype. For example, let’s consider the following C function definition and code reference, each located in a separate source file:

File1:
void read_temp_sensor(float *ret)
  {
    *ret = *(float *)0xfeff0;
  }

File2:
float poll_temperature(void)
  {
    extern float read_temp_sensor(void);
    return read_temp_sensor();
  }

The preceding code fragments are perfectly legal ANSI/ISO C. However,this software will fail since the reference and definition of read_temp_sensor are incompatible (the former is written to retrieve the return value of the function, and the latter is written to return the value via a reference parameter).

The preceding code fragments are perfectly legal ANSI/ISO C.

However, this software will fail since the reference and definition of read_temp_sensor are incompatible (the former is written to retrieve the return value of the function, and the latter is written to return the value via a reference parameter).

One obviously poor coding practice illuminated in the preceding example is the use of an extern function declaration near the code containing the reference. Although strict ANSI C requires a prototype declaration, the scope of this declaration is not covered by the specification. MISRA rule 8.6, “functions shall be declared at file scope,” attempts to prevent this coding pitfall by not allowing function declarations at function code level. However, the following code fragment would pass this MISRA test yet fail in the same manner as the preceeding example:

extern float read_temp_sensor(void);
float poll_temperature(void)
{
  return read_temp_sensor();
}

While MISRA does not explicitly disallow function declarations outside header files, this restriction is an advisable coding standard addition. Declaring all functions in header files certainly makes this error less likely yet still falls short: the header file containing the declaration may not be used in the source file containing the incompatible definition.

There is really only one way to guarantee that the declaration and definition prototypes match: detect incompatibilities using a program-wide analysis. This analysis could be performed by a static code analyzer or by the full program linker/loader. We describe the linker approach here for illustration of how a high-quality tool chain can be critical to enforcing coding standards.

When compiling the aforementioned code fragment, the compiler can insert into its output object file some marker, such as a special symbol in the symbol table or a special relocation entry, that describes the signature of the return type and parameter types used in a function call. When the function definition is compiled, the compiler also outputs the signature for the definition. At link time, when the final executable image is being generated, the linker/loader compares the signature for same-named functions and generates an error if any incompatible signature is detected.

This additional checking should add negligible overhead to the build time (the linker already must examine the references of functions to perform relocation) yet guarantees function parameter and return type compatibility and therefore improves reliability and quality of the resulting software.

One major advantage of the link-time checking approach is the ability to encompass libraries (assuming they were compiled with this feature) whose source code may not be available for static analysis.

3 Rule 8.9: An identifier with external linkage shall have exactly one external definition. This rule is analogous to the preceding rule. Mismatched variable definitions can cause vulnerabilities that will not be caught by a standard language compiler. Let’s consider the following example in which the variable temperature should take on only values between 0 and 255:

File1:
#include
unsigned int temperature;
int main(void)
{
  set_temp();
  printf(“temperature = %dn”, temperature);
  return 0;
}

File2:
unsigned char temperature;
void set_temp(void)
{
  temperature = 10;
}

Without additional error checking beyond the C standard, this program will build without error despite the mismatched definitions of temperature . On a big-endian machine with 32-bit int type and 8-bit char type, this function will execute as follows:

temperature = 167772160

As with the preceding example with function prototypes, an inter-module analysis is required to detect this mismatch. And once again, the linker/loader is a sensible tool to provide this checking.4 Rule 8.11: The static storage class specified shall be used indefinitions and declarations of objects and functions that have internallinkage. Two programmers may use variables of the same name for independent purposes in independent modules within the same program.

Onemodule’s modification of the variable will corrupt the other module’sinstance and vice versa. Furthermore, global variables may be morevisible to attackers (if, for example, the global symbol table for theprogram is available), opening up opportunities to alter important datawith malware. MISRA rule 8.11 is designed to prevent this by enforcingthe generally good policy of limiting the scope of declarations to theminimum required.

While MISRA rules 8.9 and 8.11 will preventmany forms of incompatible definition and use errors, they will notprevent all such occurrences. Another example of improper symbolicresolution relates to the unintended use of exported library definitions.Libraries are often used to collect code modules that provide a relatedset of functions.

In fact, the use of libraries to collectreusable software across projects is worthy of mention in a codingstandard. For example, most operating systems come with a C library, forexample, libc.so, that provides support for the C runtime, includingstring manipulation, memory management, and console input/outputfunctions.

A complex software project is likely to include avariety of project-specific libraries. These libraries export functionsthat can be called by application code. A reliability problem arises dueto the fact that library developers and application developers may notaccurately predict or define a priori the library’s exported interfaces.

Thelibrary may define globally visible functions intended for use only byother modules within the library. Yet once these functions are added tothe global namespace at link time, the linker may resolve referencesmade by applications that were not intended to match the definitions inthe library.

For example, let’s consider an application thatmakes use of a print function. The application developer envisions theuse of a printing library provided by the printer management team.However, the font management team created a library, also used by theapplication developer, that provides a set of font manipulationfunctions. The font management team defines a print function intended for use by other modules within the font management library.

However,if there does not exist a facility for limiting the name space oflibraries(the use of such a facility, if available, should be covered bythe coding standard),the font library’s print function maybeinadvertently used by the linker to resolve print references made by theapplication developer, causing the system to fail.

Therefore,this problem may need to be solved by something other than thecompiler’s front end. One method is to use a toolchain utility programthat hides library definitions so that they are used by the linker whenresolving intra-library references but ignored when resolvingextralibrary references.

The Windows platform employsuser-defined library export files to accomplish this separation. Whencreating Windows DLLs, developers specify which functions are exported.Functions not included in the export file will not be used to resolveapplication references. Some high-level languages, such as C++ and Ada,do a better job of automatically enforcing type consistency and namespacing than other languages such as C. Language choice may well makecertain coding standard rules trivial to enforce.

5 Rule 16.2: Functions shall not call themselves, either directly or indirectly. While directly recursive functions are easy to detect, and almostalways a bad idea in resource-constrained or safety-critical embeddedsystems due to the risk of stack overflow, indirect recursion can be farmore difficult to detect.

Sophisticated applications with complexcall graphs and calls through function pointers may contain unnoticedindirect recursion. This is yet another case in which an inter-moduleanalyzer, such as the linker/loader, is required to detect cycles in aprogram’s call graph. Handling all cases of indirect function calls,such as dynamically assigned function pointers, tables of functionpointers, and C++ virtual functions, can be extremely difficult for anautomated tool due to the ambiguity of potential functions that may bereferenced by these pointers.

A developer should try out simpletest cases with a MISRA checker to see what kinds of limitations it has.If a tool vendor is unable to improve or customize the tool to meetspecific needs, the developer should consider other tool choices or adoptstricter coding standard rules for limiting the use of problematicforms of indirect function calls.

MISRA for C ++ was released in2008 and, as one would expect, includes significant overlap with MISRA C.However, the MISRA C++ standard includes 228 rules, approximately 50%more than the MISRA C standard. The additional ground covers rulesrelated to virtual functions, exception handling, namespaces ,referenceparameters, access to encapsulated class data, and other facets specificto the C++ language.

6 Rule 9-3-2: Member functions shall not return non-const handles to class data. A simple example of a non-compliant class is as follows:

#include
class temperature
{
  public:
    int32_t &gettemp(void) { return the_temp; }
  private:
    int32_t the_temp;
}
int main(void)
{
  temperature t;
  int32_t &temp_ref = t.gettemp();
  temp_ref = 10;
  return 0;
}

Oneof the major design goals of the C++ language is to promote clean andmaintainable interfaces by encouraging the use of information hiding anddata encapsulation. A C++ class is usually formed with a combination ofinternal (private) data and class member functions. The functionsprovide a documented interface for class clients, enabling classimplementers to modify internal data structures and memberimplementations without affecting client portability.

The preceding class member function gettemp returns the address of an internal data structure. The direct access ofthis internal data by the client violates object-oriented principles ofC++. An obvious improvement (and MISRA-compliant) implementation of thepreceding sample class would be as follows:

#include
class temperature
{
  public:
    int32_t gettemp(void) { return the_temp; }
    void settemp(int32_t t) { the_temp = t; }
  private:
    int32_t the_temp;
}
int main(void)
{
  temperature t;
  t.settemp(10);
  return 0;
}

Ifthe temperature class owner decides that only eight bits of data arerequired to store valid temperatures, then she can modify the internalclass without affecting the class clients:

#include
class temperature
{
  public:
    int32_t gettemp(void) { return the_temp; }
    void settemp(int32_t t) { the_temp = t; }
  private:
    int8_t the_temp;
}

The non-compliant implementation requires modification to the client-side code due to the size change.

Embedded C++ and secure code
Anumber of advanced features in C++, such as multiple inheritance, canresult in programming that is error prone, difficult to understand andmaintain, and unpredictable or inefficient in footprint and executionspeed. Because of these drawbacks, a consortium of semiconductor anddevelopments tools vendors created a C++ subset specification called Embedded C++ that has been in widespread use for more than a decade.

Thegoal of Embedded C++ is to provide embedded systems developers who comefrom a C language background with a programming language upgrade thatbrings the major objectoriented benefits of C++ without some of its riskybaggage. To that end, Embedded C++ removes the following features ofC++:

  • Multiple inheritance
  • Virtual base classes
  • New-style casts
  • Mutable specifiers
  • Namespaces
  • Runtime type identification (RTTI)
  • Exceptions
  • Templates

Oneexample of the rationale for Embedded C++ is the difficulty indetermining the execution time and footprint of C++ exception handling.When an exception occurs, the compiler generated exception-handling codeinvokes a destructor on all automatic objects constructed since theapplicable try block was executed.

The number and execution timeof this destructor chain may be extremely difficult to estimate insophisticated applications. Furthermore, the compiler generatesexception-handling code to unwind the call stack linking the handler toits original tryblock. The additional foot print may be significant anddifficult to predict. Because the standard C++ runtime is compiled tosupport exception handling, this feature adds code bloat to programsthat do not even make use of the try and catch exception-handlingmechanisms.

For this reason, purpose-built runtime librariessupporting the reduced language subset typically accompany an EmbeddedC++ tool chain. Footprint concerns also led C++ templates to be left outof the Embedded C++ standard; in some cases, the compiler mayinstantiate a large number of functions from a template, leading tounexpected code bloat.

Of course, some of these removed featurescan be extremely useful. Careful use of templates can avoid unnecessarycode bloat while proving simpler, more maintainable source codeinterfaces. For this reason, many compilers provide variants of EmbeddedC++ that enable a development organization to add back features thatmay be acceptable for security-critical development, especially if thosefeatures are used sensibly (such as enforcing some or all of the rulesof MISRA C++).

For example, Green Hills Software’s C++ compilerprovides options for allowing the use of templates, exceptions, andother individual features with the Embedded C++ dialect (along withenabling MISRA checking).

Conclusion: Dealing with code complexity
Muchhas been published regarding the benefits of reducing complexity at thefunction level. Breaking up a software module into smaller functionsmakes each function easier to understand, maintain, and test.

Onecan think of this as meta-partitioning: applying the softwarecomponentization paradigm at a lower, programmatic, level. Acomplexity-limitation coding rule is easily enforced at compile time bycalculating a complexity metric and generating a compile-time error whenthe complexity metric is exceeded.

Once again, since thecompiler is already traversing the code tree, it does not requiresignificant additional build time to apply a simple complexitycomputation, such as the popular McCabe metric(http://en.wikipedia.org/wiki/McCabe_Metric). Because the compilergenerates an actual error pointing out the offending function, thedeveloper is unable to accidentally create code that violates the rule.

Adoptinga coding standard rule that allows a McCabe complexity value of 200 isuseless; most legacy code base will be compliant despite havingspaghetti-like code that is hard to understand, test, and maintain.

Theselection of a specific maximum complexity value is open to debate. Ifan existing code base is well modularized, a value may be selected thatallows most of the properly partitioned code to compile; future codewill be held to the same stringent standard.

When the complexitymetric is applied to a large code base that has previously not beensubjected to such an analysis, it is likely that a small number of largefunctions will fail the complexity test. Management then needs to weighthe risk of changing the code at all.

Modifying a piece of codethat, while complex, is well exercised (proven in use) and serves acritical function may reduce reliability by increasing the probabilityof introducing a flaw. The complexity enforcement tool should provide acapability to allow exceptions to the complexity enforcement rule forspecific functions that meet this profile.

Exceptions, of course,should always be approved by management and documented as such. Thecoding standard should not allow exceptions for code that is developedsubsequent to the adoption of the coding rule. These types of codingstandard policies conform to their spirit while maximizing efficiency,enabling them to be employed effectively in legacy projects.

David Kleidermacher , Chief Technology Officer of Green Hills Software, joined the company in
1991and is responsible for technology strategy, platform planning, andsolutions design. He is an authority in systems software and security,including secure operating systems, virtualization technology, and theapplication of high robustness security engineering principles to solvecomputing infrastructure problems. Mr. Kleidermacher earned his bachelor
of science in computer science from Cornell University.

This article is excerpted from Embedded Systems Security by David and Mike Kleidermacher, used with permission from Newnes , a division of Elsevier. Copyright 2012. All rights reserved.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.