Building Bare-Metal ARM Systems with GNU: Part 1 - Getting Started

Miro Samek, Quantum Leaps

June 26, 2007

Miro Samek, Quantum LeapsJune 26, 2007

The ubiquitous ARM processor family is very well supported by the GNU C/C++ toolchain. While many online and printed resources [1, 2] focus on building and installing the GNU toolchain, it is quite hard to find a comprehensive example of using the GNU C/C++ toolchain for a bare-metal ARM system that would have all the essential features needed in a real-life project.

And even if you do find such an example, you most likely won't know WHY things are done the particular way.

In this multi-part series of articles I will provide and explain all the elements you'll need to build and fine-tune a bare-metal ARM-based project with the GNU toolchain.

I start with enumerating the features needed in real-life ARM projects. I then describe a generic startup code, the matching linker script, low-level initialization, the compiler options and a basic board support package (BSP). I subsequently show how to initialize the system for C++ and how to reduce the overhead of C++ so that it's usable for low-end ARM-based MCUs.

Next, I will cover interrupt handling for ARM projects in the simple foreground/background software architecture. I will describe interrupt locking policy, interrupt handling in the presence of a prioritized interrupt controller, IRQ and FIQ assembly "wrapper" functions as well as other ARM exception handlers. I will conclude with the description of testing strategy for various interrupt preemption scenarios.

To focus the discussion, this article is based on the latest CodeSourcery G++ GNU toolchain for ARM [3] and the Atmel AT91SAM7S-EK evaluation board with the AT91SAM7S64 microcontroller (64KB of on-chip flash ROM and 16KB of static RAM).

The discussion should be generally applicable to other GNU-toolchain distributions [4, 5] for ARM and other ARM7- or ARM9- based microcontrollers. I present separate projects in C and C++ to illuminate the C++-specific issues.

What's Needed in a Real-Life Bare-Metal ARM Project
The tremendously popular ARM7/ARM9 core is quite a complicated processor in that it supports two operating states: ARM state, which executes 32-bit, word-aligned ARM instructions, and Thumb state, which operates with 16-bit, halfword-aligned Thumb instructions.

Additionally, the CPU has several operating modes, such as USER, SYSTEM, SUPERVISOR, ABORT, UNDEFINED, IRQ, and FIQ. Each of these operating modes differs in visibility of registers (register banking) and sometimes privileges to execute instructions.

On top of this, virtually every ARM-based MCU provides ARM vector remapping and a vendor-specific interrupt controller that allows nesting of the IRQ interrupts.

Unfortunately, a real-life ARM-based project needs to use many of the features of the ARM core and the critical peripherals. The following subsections describe what's typically required in a bare-metal ARM-based project.

Support for ARM Vectors Remapping
The first 32 bytes of memory at address 0x0 contain the ARM processor exception vectors, in particular, the Reset Vector at address 0x0. At boot time, the Reset Vector must be mapped to ROM. However, most ARM microcontrollers provide an option to remap the memories to put RAM at the ARM vector addresses, so that the vectors can be dynamically changed under software control.

The memory remapping option is implemented differently in various ARM microcontrollers and it is typically a source of endless confusion during flash-loading and debugging the application. Nonetheless, a real-life project typically needs to use the ARM vector remapping. This series of articles addresses the issue and presents a fairly general solution.

Low-level Initialization in C/C++
The ARM vector remapping is just one action that must be performed early in the boot sequence. The other actions might include CPU clock initialization (to speed up the rest of the boot process), external bus interface configuration, critical hardware initialization, and so on.

Most of these actions don't require assembly programming and are in fact much easier to accomplish from C/C++, yet they need to happen before main() is called. The startup sequence discussed in this series of articles allows performing the low-level initialization either from C/C++ or from assembly.

Executing Code from RAM
The majority of low-end ARM-based microcontrollers are designed to run the code directly from ROM (typically NOR flash). However, the ROM often requires more wait-states than the RAM and for some ARM devices the ROM is accessible only through the narrow 16-bit wide bus interface. Also, executing code from flash requires more power than executing the same code from SRAM.

For better performance and lower power dissipation it may be often advantageous to execute the hot-spot portions of the code from RAM. This series of articles provides support for executing code from RAM, which includes copying the RAM-based code from ROM to RAM at boot time, long jumps between ROM- and RAM-based code, as well as the linker script that allows very fine-granularity control over the functions placed in RAM.

Mixing ARM and THUMB Instruction Sets
In most low-end ARM microcontrollers the 16-bit THUMB instruction set offers both better code density and actually better performance when executed from ROM, even though the 16-bit THUMB instruction set is less powerful than the 32-bit ARM instruction set. This article shows how to use any combination of ARM and THUMB instruction sets for optimal performance.

Separate Stack Section
Most standard GNU linker scripts simply supply a symbol at the top of RAM to initialize the stack pointer. The stack typically grows towards the heap and it's hard to determine when the stack overflow occurs.

This series of articles uses the specific stack section, which is pre-filled at boot-time with a specified bit pattern to allow better monitoring of the stack usage.

The benefit of this approach is the ability to detect when you run out of RAM for the stack at link time, rather than crash-and-burn at runtime. Moreover, the separate stack section allows you to easily locate the stack in the fastest RAM available.

Debug and Release Configurations
The Makefile described in this series of articles supports building the separate debug and release configurations, each with different compiler and linker options.

Support for C++
C++ requires extra initialization step to invoke the static constructors. GNU C++ generates some extra sections for placing the tables of static constructors and destructors.

The linker script needs to locate the extra sections, and the startup code must arrange for calling the static constructors. This series of articles will provide a universal startup code and linker script that works for C++ as well as C applications.

Minimizing the Impact of C++
If you are not careful and use the standard GNU g++ settings, the code size overhead of C++ can easily take up 50KB of code or more, which renders C++ unusable for most low-level ARM MCUs.

However, by restricting C++ to the Embedded C++ subset [4,5], the impact of C++ can be negligible. This article shows how to reduce the C++ overhead with the GNU toolchain below 300 bytes of additional code compared to pure C implementation.

ARM Exceptions and Interrupt Handling
The ARM core supports several exceptions (Undefined Instruction, Prefetch Abort, Data Abort, Software Interrupt) as well as two types of interrupts: Interrupt Request (IRQ) and Fast Interrupt Request (FIQ).

Upon encountering an interrupt or an exception the ARM core does not automatically push any registers to the stack. If the application wants to nest interrupts (to take advantage of the prioritized interrupt controller available in most ARM-based MCSs), the responsibility is entirely with the application programmer to save and restore the ARM registers.

The GNU compiler's __attribute__ ((interrupt ("IRQ"))) cannot handle nested interrupts, so assembly programming is required. All this makes the handling of interrupts and exceptions quite complicated.

This series of articles also covers robust handling of nested interrupts in the presence of a prioritized interrupt controller. The approach that will be described paves the way to much better code compatibility between the traditional ARMv4T and the new ARMv7-M (Cortex) devices than the conventional ARM interrupt handling.

Coming Up Next

To read Part 2 go to Startup code and low level initialization.
To read Part 3, go to The Linker Script.
To read Part 4, go to Compiler options for C and C++.
To read Part 5, go to Fine-tuning the application.
To read Part 6 , go to General Description of Interrupt Handling.
To read Part 7, go to Interrupt locking and unlocking policy.
To read Part 8, go to Low level interrupt wrapper functions.
To read Part 9, to to C-level ISRs and other ARM exceptions.
To read Part 10, go to Test Strategies.

To download the C and C++ source code associated with this article series, go to's Downloadable Code page, or go to Blinky for C and Blinky for C++ to download the Zip files.

Miro Samek, Ph.D., is president of Quantum Leaps, LLC. He can be contacted at

[1] Lewin A.R.W. Edwards, "Embedded System Design on a Shoestring", Elsevier 2003.
[2] ARM Projects
[3] GNU Toolchain for ARM, CodeSourcery
[4] GNU ARM toolchain.
[5] GNU X-Tools, Microcross
[6] Sloss, Andrew, Dominic Symes, and Chris Wright, "ARM System Developer's Guide: Designing and Optimizing System Software", Morgan Kaufmann, 2004.

Loading comments...