Building Bare-Metal ARM Systems with GNU: Part 1 – Getting Started

The ubiquitous ARM processor family is very well supported by the GNUC/C++ toolchain. While many online and printed resources [1, 2] focus on building andinstalling the GNU toolchain, it is quite hard to find a comprehensiveexample of using the GNU C/C++ toolchain for abare-metal ARM system that would have allthe essential features needed in a real-life project.

And even if you do find such an example, you most likely won't knowWHY things are done the particular way.

In this multi-part series of articles I will provide and explain allthe elements you'll need to build and fine-tune a bare-metal ARM-basedproject with the GNU toolchain.

I start with enumerating the features needed in real-life ARMprojects. I then describe a generic startup code, the matching linkerscript, low-level initialization, the compiler options and a basicboard support package (BSP). I subsequently show how to initialize thesystem for C++ and how to reduce the overhead of C++ so that it'susable for low-end ARM-based MCUs.

Next, I will cover interrupt handling for ARM projects in the simpleforeground/background software architecture. I will describe interruptlocking policy, interrupt handling in the presence of a prioritizedinterrupt controller, IRQ and FIQ assembly “wrapper” functions as wellas other ARM exception handlers. I will conclude with the descriptionof testing strategy for various interrupt preemption scenarios.

To focus the discussion, this article is based on the latestCodeSourcery G++ GNU toolchain for ARM [3] and the Atmel AT91SAM7S-EK evaluationboard with the AT91SAM7S64 microcontroller (64KB of on-chip flash ROMand 16KB of static RAM).

The discussion should be generally applicable to other GNU-toolchaindistributions [4, 5] for ARMand other ARM7- or ARM9 – basedmicrocontrollers. I present separate projects in C and C++ to illuminate theC++-specific issues.

What's Needed in a Real-LifeBare-Metal ARM Project
The tremendously popular ARM7/ARM9 core is quite a complicatedprocessor in that it supports two operating states: ARM state, whichexecutes 32-bit, word-aligned ARM instructions, and Thumb state, whichoperates with 16-bit, halfword-aligned Thumb instructions.

Additionally, the CPU has several operating modes, such as USER,SYSTEM, SUPERVISOR, ABORT, UNDEFINED, IRQ, and FIQ. Each of theseoperating modes differs in visibility of registers (register banking)and sometimes privileges to execute instructions.

On top of this, virtually every ARM-based MCU provides ARM vectorremapping and a vendor-specific interrupt controller that allowsnesting of the IRQ interrupts.

Unfortunately, a real-life ARM-based project needs to use many ofthe features of the ARM core and the critical peripherals. Thefollowing subsections describe what's typically required in abare-metal ARM-based project.

Support for ARM Vectors Remapping
The first 32 bytes of memory at address 0x0 contain the ARM processorexception vectors, in particular, the Reset Vector at address 0x0. Atboot time, the Reset Vector must be mapped to ROM. However, most ARMmicrocontrollers provide an option to remap the memories to put RAM atthe ARM vector addresses, so that the vectors can be dynamicallychanged under software control.

The memory remapping option is implemented differently in variousARM microcontrollers and it is typically a source of endless confusionduring flash-loading and debugging the application. Nonetheless, areal-life project typically needs to use the ARM vector remapping. Thisseries of articles addresses the issue and presents a fairly generalsolution.

Low-level Initialization in C/C++
The ARM vector remapping is just one action that must be performedearly in the boot sequence. The other actions might include CPU clockinitialization (to speed up the rest of the boot process), external businterface configuration, critical hardware initialization, and so on.

Most of these actions don't require assembly programming and are infact much easier to accomplish from C/C++, yet they need to happenbefore main() is called. The startup sequence discussed in this seriesof articles allows performing the low-level initialization either fromC/C++ or from assembly.

Executing Code from RAM
The majority of low-end ARM-based microcontrollers are designed to runthe code directly from ROM (typically NOR flash). However, the ROMoften requires more wait-states than the RAM and for some ARM devicesthe ROM is accessible only through the narrow 16-bit wide businterface. Also, executing code from flash requires more power thanexecuting the same code from SRAM.

For better performance and lower power dissipation it may be oftenadvantageous to execute the hot-spot portions of the code from RAM.This series of articles provides support for executing code from RAM,which includes copying the RAM-based code from ROM to RAM at boot time,long jumps between ROM- and RAM-based code, as well as the linkerscript that allows very fine-granularity control over the functionsplaced in RAM.

Mixing ARM and THUMB InstructionSets
In most low-end ARM microcontrollers the 16-bit THUMB instruction setoffers both better code density and actually better performance whenexecuted from ROM, even though the 16-bit THUMB instruction set is lesspowerful than the 32-bit ARM instruction set. This article shows how touse any combination of ARM and THUMB instruction sets for optimalperformance.

Separate Stack Section
Most standard GNU linker scripts simply supply a symbol at the top ofRAM to initialize the stack pointer. The stack typically grows towardsthe heap and it's hard to determine when the stack overflow occurs.

This series of articles uses the specific stack section, which ispre-filled at boot-time with a specified bit pattern to allow bettermonitoring of the stack usage.

The benefit of this approach is the ability to detect when you runout of RAM for the stack at link time, rather than crash-and-burn atruntime. Moreover, the separate stack section allows you to easilylocate the stack in the fastest RAM available.

Debug and Release Configurations
The Makefile described in this series of articles supports building theseparate debug and release configurations, each with different compilerand linker options.

Support for C++
C++ requires extra initialization step to invoke the staticconstructors. GNU C++ generates some extra sections for placing thetables of static constructors and destructors.

The linker script needs to locate the extra sections, and thestartup code must arrange for calling the static constructors. Thisseries of articles will provide a universal startup code and linkerscript that works for C++ as well as C applications.

Minimizing the Impact of C++
If you are not careful and use the standard GNU g++ settings, the codesize overhead of C++ can easily take up 50KB of code or more, whichrenders C++ unusable for most low-level ARM MCUs.

However, by restricting C++ to the Embedded C++ subset [4,5], theimpact of C++ can be negligible. This article shows how to reduce theC++ overhead with the GNU toolchain below 300 bytes of additional codecompared to pure C implementation.

ARM Exceptions and InterruptHandling
The ARM core supports several exceptions (Undefined Instruction, Prefetch Abort,Data Abort, Software Interrupt ) as well as two types ofinterrupts: Interrupt Request (IRQ) and FastInterrupt Request (FIQ).

Upon encountering an interrupt or an exception the ARM core does notautomatically push any registers to the stack. If the application wantsto nest interrupts (to take advantage of the prioritized interruptcontroller available in most ARM-based MCSs), the responsibility isentirely with the application programmer to save and restore the ARMregisters.

The GNU compiler's __attribute__((interrupt (“IRQ”))) cannot handle nested interrupts, soassembly programming is required. All this makes the handling ofinterrupts and exceptions quite complicated.

This series of articles also covers robust handling of nestedinterrupts in the presence of a prioritized interruptcontrolle r. Theapproach that will be described paves the way to much better codecompatibility between the traditional ARMv4T and the new ARMv7-M(Cortex) devices than the conventional ARM interrupt handling.

Coming Up Next

To read Part 2 go to Startup code and low level initialization.
To read Part 3 , go to The Linker Script .
To read Part 4 , go to Compiler options for C and C++.
To read Part 5 , go to Fine-tuning the application.
To read Part 6 , go to General Description of Interrupt Handling.
To read Part 7 , go to Interrupt locking and unlocking policy.
To read Part 8 , go to Low level interrupt wrapper functions.
To read Part 9 , to to C-level ISRs and other ARM exceptions.
To read Part 10 , go to Test Strategies.

To download the C and C++ sourcecode associated with this article series, go to's Downloadable Code page , or go to Blinky for C and Blinky for C++ to download theZipfiles.

MiroSamek, Ph.D., is president of QuantumLeaps, LLC. He can be contacted at

[1] Lewin A.R.W. Edwards, “EmbeddedSystem Design on a Shoestring“, Elsevier 2003.
[2] ARMProjects
[3] GNU Toolchainfor ARM, CodeSourcery
[4] GNU ARMtoolchain.
[5] GNU X-Tools, Microcross
[6] Sloss, Andrew, DominicSymes, and Chris Wright, “ARM System Developer's Guide: Designing andOptimizing System Software “, Morgan Kaufmann, 2004.

6 thoughts on “Building Bare-Metal ARM Systems with GNU: Part 1 – Getting Started

  1. “Thanks a lot for the details in bringing up a bare system. Really useful 10 years ago when I work as a consultant.nnNot time fast forwarded to 2019, and the ARM has advanced to Cotext-Axx with ArmV8 structure. I am wondering has the procedure changed

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.