Validating your GNU platform toolchain: tips and techniques
Open-source tools for your open-source Android/Linux platform.
In the past few years we've seen the embedded design community move away from proprietary software development tools and move decisively closer to open-source software (OSS) development.
As one would expect, accompanying this shift is an increased demand for proven, top-quality OSS tools. For embedded systems developers designing for embedded Linux, the GNU toolchain is the most popular choice due to its standing as the natural toolchain of the Linux kernel community.*
So how do you obtain a GNU toolchain? You can opt to purchase a commercial toolchain from an established vendor or you may decide to build the toolchain yourself. Successfully building a GNU toolchain, while a significant achievement, is only half the work. Meaningfully testing and validating to ensure the production-worthiness of your toolchain, is the critical second half.
With its massive codebase of 10 million lines or more, adequately testing the GNU toolchain can pose a mammoth task, as illustrated by Table 1. It's vital to create a methodical validation strategy to achieve maximal testing of the various components.
This article presents you with techniques for validation setup that effectively tests your GNU toolchain.
Validation begins with the compiler
Typically, most testing of the GNU toolchain is focused on the compiler--and for good reason. The compiler is at the very heart of any toolchain environment. So before we discuss test and validation techniques for the entire GNU toolchain, it's prudent to put your compiler through a few key exercises, these include:
- Compiling programs and running those programs in the target environment. This technique verifies that the generated code behaves as intended. You can also use this approach to test the performance of the code generated by the toolchain.
- Compiling programs and inspecting the generated object file or executable image, without running the generated code. This checks to make sure the generated code contains expected machine instructions, correct debugging information, and conforms to a specified Application Binary Interface, etc.
- Compiling fragments of a program with multiple compilers and linking the fragments together. This checks that the compiler you wish to validate interoperates with a known-good compiler for the target platform.
- Compiling invalid program fragments and checking for appropriate error messages. Sometimes called "negative testing," this procedure checks to see if the compiler is correctly enforcing constraints.
But a toolchain is more than just a compiler. The GNU toolchain includes: assembler, linker, runtime libraries, debugger, debug stub(s), and an integrated development environment (IDE). Different testing metrics are required for testing each of these components. It is also important to test many of these components interactively. Key interactive tests include:
- Interactive testing of the debugger. You should compile and debug programs using the debugger with your target system, ensuring that all debugger functionality works as expected.
- Interactive testing of the IDE. Like debuggers, IDEs are interactive. You should check that all desired IDE functionality (including integration with GDB, the GDB stub, and the target system) works as intended.
Once you've tested the individual components and checked to make sure they successfully work interactively, it's time to introduce testsuites into your GNU toolchain environment. A testsuite is composed of a number of tests that target performance, functionality, or a specific set of behaviors. The GNU regression, conformance, and performance testsuites are the most useful for validating your GNU toolchain (Figure 1).
The GNU regression testsuite
This testsuite answers fundamental questions related to correctness requirements, including GNU source extensions.
Many of the GNU toolchain components (GNU Compiler Collection [GCC], GNU Binary Utilities, and GNU Debugger [GDB]) include testsuites based on the DejaGNU framework. Every GNU toolchain build should be validated using these testsuites because only the DejaGNU testsuites provide test coverage for GNU extensions to the C and C++ programming languages and the wide variety of features available in other tools. The DejaGNU GDB testsuite performs live tests of the debugger on a running target system to ensure appropriate interactive behavior. Taken together, the DejaGNU testsuites contain tens of thousands of tests and are usually expanded to include new tests whenever a defect is corrected or a new feature is added.
To use each of these testsuites, you must first develop a DejaGNU board configuration file for your target system. The board configuration file will contain code to perform a variety of basic operations, including, most critically:
- Running programs on your target system. This code must upload the program to the target system (or make it available via a network file system), execute the program, and report the results.
- Rebooting your target system. Many embedded systems lack memory protection. Therefore, when a test fails, the target system is likely corrupted, and the system must be rebooted. You may require specialized hardware to manage automatic rebooting, such as managed power strips.
The board configuration is written in the Expect programming language, which is an extension to the Tcl programming language. Unfortunately, documentation for DejaGNU is very sparse, so you will likely need to make use of existing board configurations as a starting point.
Testing installed toolchains
The DejaGNU testsuites have customarily been run from the build directories in which the toolchain was built. However, testing in this manner requires invoking the tools in a substantially different way--invoking them in how you actually plan to use them. More accurate results can be obtained by installing the toolchain first and then running the tests. Testing installed components ensures that the tools tested are the same binaries, invoked in the same way, as you plan to use them.
However, testing installed toolchains is more complex than testing from the build directory. In particular, you must create a DejaGNU "site file" to describe your installation. You may also find that some of the DejaGNU testsuites require modification to support testing installed toolchains. If you wish to test installed toolchains, you may find it helpful to automate both the installation process and the generation of appropriate DejaGNU site files so that you can easily reproduce your testing.
Testing the GNU library
The GLIBC testsuite doesn't yet make use of the DejaGNU framework. However, it does contain a set of tests that you should run. These tests cover much of the functionality provided in GLIBC, including, in particular, tests for the Native POSIX Threads Library (NPTL), which provides high-performance threading support on GNU/Linux systems. These tests provide a powerful mechanism for testing the compiler, C library, and kernel, all of which must cooperate to provide support for threads.
Because the GNU C Library testsuite does not support cross-testing (such as compiling the tests on one system and testing on another) you'll have to modify the testsuite to support testing on embedded systems.
Conformance testsuites are used to validate the behavior of your toolchain relative to published specifications. You may wish to seek out additional conformance testsuites to validate functionality specific to your intended use of the compiler.
For example, the Plum Hall C and C++ validation suites are comprehensive tests for conformance to the C and C++ programming language specifications. The Plum Hall testsuites contain tests for nearly every sentence of the published specifications, including both the programming languages proper and the associated runtime libraries. In addition, the Plum Hall testsuite can automatically generate a number of "expression tests" that contain complex arithmetic expressions. These expression tests have proven useful in identifying instances of incorrect code generation.
The Open POSIX testsuite checks conformance of a compiler, C library, and operating system to the POSIX specification. This testsuite runs C programs that make heavy use of the C library and checks that the results returned by the library routines are correct.
Performance testsuites are designed to provide data about the speed at which the code generated by the toolchain executes. These testsuites can help to identify misconfigurations of the toolchain, such as situations in which the generated code is using software floating-point on a system that supports hardware floating point.
They are also useful in comparing newer versions of the toolchain with older versions. For example, before deploying an upgraded version of the toolchain for production use, you might wish to ensure that the code generated is in fact better (or, at least, no worse) than that generated by the current toolchain.
The Embedded Microprocessor Benchmark Consortium (EEMBC) benchmarks are widely used as measurements of embedded system performance. These benchmarks are divided by application areas; for example, there are EEMBC benchmarks for networking applications, automotive applications, and for office automation.
Testing multiple options
Once you have validated the compiler using a single set of options, you should expand your "validation matrix." There are three important dimensions to the validation matrix:
- Target system.
- Level of optimization.
- Host system.
Target-system options include the choice of target CPU and operating system, whether to compile big-endian or little-endian code, and other related options that specify the system on which the code generated by the toolchain will execute. It's not at all uncommon for the GNU toolchain to behave correctly for one target system but not for a seemingly related system. For example, even though little-endian code works, big-endian code may not. These problems are especially likely if you have incorporated patches from hardware manufacturers designed to support a particular CPU or CPU family, as these vendors may well not have tested other configurations. If you intend to use your version of GCC with multiple targets, you should validate each target independently.
Optimization options include whether to generate code best suited to debugging, code designed to run quickly, or code optimized for size. Different optimization options exercise different code paths in the compiler and can therefore have a substantial impact on test results. The three compilation modes used most often in development are debug (-g); optimized for time (-O2); and optimized for space (-Os). You should ensure that your testing exercises all of these operating modes.
Finally, you should validate the toolchain on all host systems (such as Microsoft Windows or GNU/Linux) on which you will be using the toolchain. In addition to checking that the generated code is correct, you should verify that the code generated is in fact identical on all host systems. Because most GNU toolchain developers use IA32 GNU/Linux systems for their own development, host support for Microsoft Windows has tended to be particularly problematic. For example, some versions of GCC have made incorrect assumptions about the behavior of the Windows C library that resulted in the generation of incorrect assembly code. Similarly, reliance on pointer values as hash-table keys has resulted in different code generated on different hosts.
Analyzing and fixing failures
Having gathered data about which tests pass and fail, you must now evaluate the results. The GNU toolchain (like all toolchains) has defects. Therefore, in evaluating the toolchain you've built, you should attempt to determine whether or not the failures you observe are unique to your toolchain. If the failures you encounter are not present in other builds of similar toolchains, you may have built the toolchain incorrectly.
However, even if the failures are not specific to your toolchain, you must evaluate whether or not they are sufficiently severe as to impede use of your toolchain. Failing tests can be divided into the following categories:
- Defects in the tests themselves. This category includes tests that make incorrect assumptions about the target environment, such as tests that assume the target is a little-endian machine, has 32-bit pointers, or treats plain char as a signed data type. These tests should be corrected. If the tests cannot be readily corrected, these failures should be ignored.
- Defects in the hardware platform. In some cases, testing the toolchain reveals microprocessor defects (such as incorrect handling of instructions in delay slots). In other cases, the target board may contain faulty parts or may use faulty software. It may be necessary to implement toolchain work-arounds for some of these problems, such as the insertion of NOP instructions to avoid CPU defects. In other cases, it may be sufficient to ignore the failures.
- Resource limitations. For example, some tests may require more memory than is available on the target system, or may require that operations complete faster than can reasonably be achieved on the target system. These failures should be ignored.
- Defects in the toolchain. These defects must be further analyzed to determine the significance of the failure (its likely impact on software developers using the toolchain) and the difficulty of fixing the defect.
It's recommended that you categorize each of the failures you observe so that you can fully evaluate the quality of your toolchain.
If you've found problems through the validation process, you'll need to fix them before deploying your toolchain. Even if all of your validation has been successful, you may encounter problems in the course of using the toolchain. In either event, you'll need to determine the cause of the problem and develop a solution.
Since the toolchain contains several million lines of code, the first step is to identify the component that is causing the problem. Then, you will have to debug that component. You may have to spend some time becoming familiar with the source code for the toolchain before you can correct the defect. You may wish to post a description of your problem to the public mailing list for the affected component asking for help.
Some components also have publicly accessible defect-tracking systems that can be used to report problems. When you've fixed the problem, you should consider contributing the change that you've made to the public source repository for the affected component so that others can benefit from your improvement, just as you have benefited from theirs. (This is, in fact, the whole idea behind the open-source community.)
While a GNU toolchain can be built from scratch, the validation process is not so straightforward and can be quite overwhelming. An alternate approach to accessing a high-quality, fully validated GNU toolchain is to purchase a commercial enhanced version of the GNU toolchain from an embedded tool vendor (such as Sourcery CodeBench from Mentor Graphics). Commercial products will help your team focus its energy and expertise on the actual end application, and not the toolchain.
Mark Mitchell is director of embedded tools for the Mentor Graphics Embedded Software Division and responsible for the Mentor Embedded Sourcery CodeBench downloadable GNU toolchain development environments. Before joining Mentor Graphics, Mark was the founder and chief sourcerer of CodeSourcery, Inc. Mark has worked on C/C++ software development tools since 1994, and has been involved in Free Software Foundation's (FSF) GNU Compiler Collection (GCC). Since 2001, he has been an active member of the GCC Steering Committee. He holds degrees in computer science from Harvard and Stanford.
Anil Khanna has over 15 years of technical and product marketing experience with a background in both design automation tools (EDA) as well as programmable logic hardware design. Anil is currently senior product marketing manager of Mentor Embedded Sourcery tools. Anil holds a masters in electrical and computer engineering from Portland State University in Portland, Oregon.
This article provided courtesy of Embedded.com and Embedded Systems Design magazine.
See more articles like this one on Embedded.com.
This material was first printed in Embedded Systems Design magazine.
Sign up for subscriptions and newsletters.
Copyright © 2011
UBM--All rights reserved.