Building Bare-Metal ARM Systems with GNU: Part 2

Miro Samek, Quantum Leaps

July 03, 2007

Miro Samek, Quantum LeapsJuly 03, 2007

In this part I start digging into the code discussed earlier in Part 1 and which is available online at the Download Code page. The code contains C and C++ versions of the example application called "Blinky", because it blinks the 4 user LEDs of the Atmel AT91SAM7S-EK evaluation board.

The C version is located in the subdirectory c_blinky, and the equivalent C++ version is located in the subdirectory cpp_blinky. The Blinky application is primitive, but is carefully designed to use all the features covered in this multi-part article. The projects are based on the latest CodeSourcery G++ GNU toolchain for ARM [1].

In this part, I describe the generic startup code for the GNU toolchain as well as the low-level initialization for a bare-metal ARM system. The recommended reading for this part includes the "IAR Compiler Reference Guide" [2], specifically sections "System startup and termination" as well as "Customizing system initialization".

The Startup Code
The startup sequence for a bare-metal ARM system is implemented in the assembly file startup.s, which is identical for C and C++ projects. This file is designed to be generic, and should work for any ARM-based MCU without modifications.

All CPU- and board-specific low-level initialization that needs to occur before entering the main() function should be handled in the routine low_level_init(), which typically can be written in C/C++, but can also be coded in assembly, if necessary.

Listing 1 Startup code in GNU assembly (startup.s)

Listing 1 above shows the complete startup code in assembly. The highlights of the startup sequence are as follows:

(1) The .text directive tells GNU assembler (as) to assemble the following statements onto the end of the text subsection.

(2) The .code 32 directive selects the 32-bit ARM instruction set (the value 16 selects THUMB). The ARM core starts execution in the ARM state.

(3) The .global directive makes the symbol _start visible to the GNU linker (ld).

(4) The .func directive emits debugging information for the function _start. (The function definition must end with the directive .endfunc).

(5) Upon reset, the ARM core fetches the instruction at address 0x0, which at boot time must be mapped to a non-volatile memory (ROM). However, later the ROM might be remapped to a different address range by means of a memory remap operation. Therefore the code in ROM is typically linked to the final ROM location and not to the ROM location at boot time.

This dynamic changing of the memory map has at least two consequences. First, the few initial instructions must be position-independent meaning that only PC-relative addressing can be used. Second, the initial vector table is used only very briefly and is replaced with a different vector table established in RAM.

(6) The initial vector table contains just endless loops (relative branches to self). This vector table is used only very briefly until it is replaced by the vector table in RAM. Should an exception occur during this transient, the board is most likely damaged and the CPU cannot recover by itself. A safety-critical device should have a secondary circuit (such as an external watchdog timer driven by a separate clock source) that would announce the condition to the user.

(7) It is always a good idea to embed a prominent copyright message close to the beginning of the ROM image. You should customize this message for your company.

(8) Alignment to the word boundary is necessary after a string embedded directly in the code.

(9) The reset vector branches to this label.

(10) The r0 and r1 registers are used as the arguments of the upcoming call to the low_level_init() function. The register r0 is loaded with the linked address of the reset handler, which might be useful to set up the RAM-based vector table inside the low_level_init() function.

(11) The r1 register is loaded with the linked address of the C-initialization code, which also is the return address from the low_level_init() function. Some MCUs (such as AT91x40 with the EBI) might need this address to perform a direct jump after the memory remap operation.

(12) The link register is loaded with the return address. Please note that the return address is the _cstartup label at its final linked location, and not the subsequent PC value (so loading the return address with LDR lr,pc would be incorrect.)

(13) The temporary stack pointer is initialized to the end of the stack section. The GNU toolset uses the full descending stack meaning that the stack grows towards the lower memory addresses.

Note: The stack pointer initialized in this step might be not valid in case the RAM is not available at the linked address before the remap operation. It is not an issue in the AT91SAM7S family, because the RAM is always available at the linked address (0x00200000). However, in other devices (such as AT91x40) the RAM is not available at its final location before the EBI remap. In this latter case you might need to writhe the low_level_init() function in assembly to make sure that the stack pointer is not used until the memory remap.

(14) The function low_level_init() is invoked with a relative branch instruction. Please note that the branch-with-link (BL) instruction is specifically NOT used because the function might be called not from its linked address. Instead the return address has been loaded explicitly in the previous instruction.

Note: The function low_level_init() can be coded in C/C++ with the following restrictions. The function must execute in the ARM state and it must not rely on the initialization of .data section or clearing of the .bss section. Also, if the memory remapping is performed at all, it must occur inside the low_level_init() function because the code is no longer position-independent after this function returns.

(15) The _cstartup label marks the beginning of C-initialization.

(16) The section .fastcode is used for the code executed from RAM. Here this section is copied from ROM to its linked address in RAM (see also the linker script).

(17) The section .data is used for initialized variables. Here this section is copied from its load address in ROM to its linked address in RAM (see also the linker script).

(18) The section .bss is used for uninitialized variables, which the C standard requires to be set to zero. Here this section is cleared in RAM (see also the linker script).

(19) The section .stack is used for the stacks. Here this section is filled with the given pattern, which can help to determine the stack usage in the debugger.

(20) All banked stack pointers are initialized.

(21) The User/System stack pointer is initialized last. All subsequent code executes in the System mode.

(22) The library function __libc_init_array invokes all C++ static constructors (see also the linker script). This function is invoked with the BX instruction, which allows state change to THUMB. This function is harmless in C.

(23) The main() function is invoked with the BX instruction, which allows state change to THUMB.

(24) The main() function should never return in a bare-metal application because there is no operating system to return to. In case main() ever returns, the Software Interrupt exception is entered, in which the user can customize how to handle this problem.

Low-Level Initialization
The function low_level_init() performs the low-level initialization, which always strongly depends on the specific ARM MCU and the particular memory remap operation. As described in the previous section, the function low_level_init() can be coded in C or C++, but must be compiled to ARM and cannot rely on the initialization of the .data section, clearing of the .bss section, or on C++ static constructors being called.

Listing 2 Low-level initialization for AT91SAM7S microcontroller.

Listing 2 above shows the low-level initialization of the AT91SAM7S microcontroller in C. It is important to point out here that the initialization for a different microcontroller, such as AT91x40 series with the EBI, could be different mostly due to different memory remap operation. The highlights of the low-level initialization are as follows:

(1) The GNU gcc is a standard-compliant compiler that supports the C-99 standard exact-width integer types. The use of these types is recommended.

(2) The arguments of low_level_init() are as follows: reset_addr is the linked address of the reset handler and return_addr is the linked return address from the low_level_init() function.

Note: In the C++ environment, the function low_level_init() must be defined with the extern "C" linkage specification because it is called from assembly.

(3) The symbol __ram_start denotes the linked address of RAM. In AT91SAM7S the RAM is always available at this address, so the symbol __ram_start denotes also the RAM location before the remap operation (see the linker script).

(4) The constant LDR_PC_PC contains the opcode of the ARM instruction LDR pc,[pc,...], which is used to populate the RAM vector table.

(5) This constant MAGIC is used to test if the remap operation has been performed already.

(6) The number of flash wait states is reduced from the default value set at reset to speed up the boot process.

(7) The AT91 watchdog timer is disabled so that it does not expire during the boot process. The application can choose to enable the watchdog after the main() function is called.

(8) The CPU and peripheral clocks are configured. This speeds up the rest of the boot process.

(9) The ARM vector table is established in RAM before the memory remap operation, so that the ARM core is provided with valid vectors at all times. The vector table has the following structure:

All entries in the RAM vector table load the PC with the address located in the secondary jump table that immediately follows the primary vector table in memory. For example, the Reset exception at address 0x00 loads the PC with the word located at the effective address: 0x00 (+8 for pipeline) +0x18 = 0x20, which is the address immediately following the ARM vector table.

Note: Some ARM MCUs, such as the NXP LPC family, remap only a small portion of RAM down to address zero. However, the amount of RAM remapped is always at least 0x40 bytes (exactly 0x40 bytes in case of LPC), which is big enough to hold both the primary vector table and the secondary jump table.

(10) The jump table entry for the unused exception is initialized with the MAGIC number. Please note that this number is written to RAM at its location before the memory remap operation.

(11) The secondary jump table in RAM is initialized to contain jump to reset_addr at 0x20 and endless loops for the remaining exceptions. For example, the Prefetch Abort exception at address 0x0C will cause loading the PC again with 0x0C, so the CPU will be tied up in a loop.

This is just the temporary setting until the application initializes the secondary jump table with the addresses of the application-specific exception handlers. Until this happens, the application is not ready to handle the interrupts or exceptions, anyway.

Note: Using the secondary jump table has many benefits. First, the application can very easily change the exception handler by simply writing the handler's address in the secondary table, rather than synthesize a relative branch instruction at the primary vector table. Second, the load to PC instruction allows utilizing the full 32-bit address space for placement of the exception handlers, whereas the relative branch instruction is limited to +/- 25 bits relative to the current PC.

(12) The word at the absolute address 0x14 is loaded and compared to the MAGIC number. The location 0x14 is in ROM before the remap operation, and is in RAM after the remap operation. Before the remap operation the location 0x14 contains the B . instruction, which is different from the MAGIC value.

(13) If the location 0x14 does not contain the MAGIC value, this indicates that the write to RAM did not change the value at address 0x14. This, in turn, means that RAM has not been remapped to address 0x00 yet (i.e., ROM is still mapped to the address 0x00). In this case the remap operation must be performed.

Note: The AT91SAM7 Memory Controller remap operation is a toggle and it is impossible to detect whether the remap has been performed by examining any of the Memory Controller registers. The technique of writing to the low RAM address can be used to reliably detect whether the remap operation has been performed to avoid undoing it. This safeguard is very useful when the reset is performed during debugging. The soft-reset performed by a debugger typically does not undo the memory remap operation, so the remap should not be performed in this case.

(14) The low_level_init() function returns to the address set by the startup code in the lr register. Please note that at this point the code starts executing at its linked address.

Next, in Part 3: The linker script for the GNU toolchain.
To read Part 1, go to "What's needed to get started."

To download the C and C++ source code associated with this article series, go to's Downloadable Code page, or go to Blinky for C and Blinky for C++ to download the Zip files.

Miro Samek, Ph.D., is president of Quantum Leaps, LLC. He can be contacted at

[1] GNU Assembler (as) HTML documentation included in the CodeSourcery Toolchain for ARM.
[2] IAR Systems, "ARM IAR C/C++ Compiler Reference Guide for Advanced RISC Machines Ltd's ARM Cores", Part number: CARM-13, Thirteenth edition: June 2006. Included in the free EWARM KickStart edition
[3] Lewin A.R.W. Edwards, "Embedded System Design on a Shoestring", Elsevier 2003.
[4] ARM Projects.

Loading comments...