Building Bare-Metal ARM Systems with GNU: Part 2

In this part I start digging into the code discussed earlier in Part 1 and which is availableonlineat the Download Codepage. The code contains C and C++ versions of the example applicationcalled”Blinky”, because it blinks the 4 user LEDs of the Atmel AT91SAM7S-EKevaluation board.

The C version is located in the subdirectory c_blinky, and theequivalent C++ version is located in the subdirectory cpp_blinky. TheBlinky application is primitive, but is carefully designed to use allthe features covered in this multi-part article. The projects are basedon the latest CodeSourcery G++ GNU toolchain for ARM [1].

In this part, I describe the generic startup code for the GNUtoolchain as well as the low-level initialization for a bare-metal ARMsystem. The recommended reading for this part includes the “IARCompiler Reference Guide” [2] ,specifically sections “System startup and termination” as well as”Customizing system initialization”.

The Startup Code
The startup sequence for a bare-metal ARM system is implemented in theassembly file startup.s, which is identical for C and C++ projects.This file is designed to be generic, and should work for any ARM-basedMCU without modifications.

All CPU- and board-specific low-level initialization that needs tooccur before entering the main() function should be handled in theroutine low_level_init(), which typically can be written in C/C++, butcan also be coded in assembly, if necessary.

Listing1 Startup code in GNU assembly (startup.s)

Listing 1 above shows thecomplete startup code in assembly. The highlights of the startupsequence are as follows:

(1) The .text directivetells GNU assembler (as) to assemble the following statements onto theend of the text subsection.

(2) The .code 32 directiveselects the 32-bit ARM instruction set (the value 16 selects THUMB).The ARM core starts execution in the ARM state.

(3) The .global directivemakes the symbol _start visible to the GNU linker (ld).

(4) The .func directiveemits debugging information for the function _start. (The functiondefinition must end with the directive .endfunc).

(5) Upon reset, the ARM corefetches the instruction at address 0x0, which at boot time must bemapped to a non-volatile memory (ROM). However, later the ROM might beremapped to a different address range by means of a memory remapoperation. Therefore the code in ROM is typically linked to the finalROM location and not to the ROM location at boot time.

This dynamic changing of the memory map has at least twoconsequences. First, the few initial instructions must beposition-independent meaning that only PC-relative addressing can beused. Second, the initial vector table is used only very briefly and isreplaced with a different vector table established in RAM.

(6) The initial vector tablecontains just endless loops (relative branches to self). This vectortable is used only very briefly until it is replaced by the vectortable in RAM. Should an exception occur during this transient, theboard is most likely damaged and the CPU cannot recover by itself. Asafety-critical device should have a secondary circuit (such as anexternal watchdog timer driven by a separate clock source) that wouldannounce the condition to the user.

(7) It is always a good ideato embed a prominent copyright message close to the beginning of theROM image. You should customize this message for your company.

(8) Alignment to the wordboundary is necessary after a string embedded directly in the code.

(9) The reset vectorbranches to this label.

(10) The r0 and r1 registersare used as the arguments of the upcoming call to the low_level_init()function. The register r0 is loaded with the linked address of thereset handler, which might be useful to set up the RAM-based vectortable inside the low_level_init() function.

(11) The r1 register isloaded with the linked address of the C-initialization code, which alsois the return address from the low_level_init() function. Some MCUs(such as AT91x40 with the EBI) might need this address to perform adirect jump after the memory remap operation.

(12) The link register isloaded with the return address. Please note that the return address isthe _cstartup label at its final linked location, and not thesubsequent PC value (so loading the return address with LDR lr,pc wouldbe incorrect.)

(13) The temporary stackpointer is initialized to the end of the stack section. The GNU toolsetuses the full descending stack meaning that the stack grows towards thelower memory addresses.

Note: The stack pointerinitialized in this step might be not valid in case the RAM is notavailable at the linked address before the remap operation. It is notan issue in the AT91SAM7S family, because the RAM is always availableat the linked address (0x00200000). However, in other devices (such asAT91x40) the RAM is not available at its final location before the EBIremap. In this latter case you might need to writhe thelow_level_init() function in assembly to make sure that the stackpointer is not used until the memory remap.

(14) The functionlow_level_init() is invoked with a relative branch instruction. Pleasenote that the branch-with-link (BL) instruction is specifically NOTused because the function might be called not from its linked address.Instead the return address has been loaded explicitly in the previousinstruction.

Note: The functionlow_level_init() can be coded in C/C++ with the following restrictions.The function must execute in the ARM state and it must not rely on theinitialization of .data section or clearing of the .bss section. Also,if the memory remapping is performed at all, it must occur inside thelow_level_init() function because the code is no longerposition-independent after this function returns.

(15) The _cstartup labelmarks the beginning of C-initialization.

(16) The section .fastcodeis used for the code executed from RAM. Here this section is copiedfrom ROM to its linked address in RAM (see also the linker script).

(17) The section .data isused for initialized variables. Here this section is copied from itsload address in ROM to its linked address in RAM (see also the linkerscript).

(18) The section .bss isused for uninitialized variables, which the C standard requires to beset to zero. Here this section is cleared in RAM (see also the linkerscript).

(19) The section .stack isused for the stacks. Here this section is filled with the givenpattern, which can help to determine the stack usage in the debugger.

(20) All banked stackpointers are initialized.

(21) The User/System stackpointer is initialized last. All subsequent code executes in the Systemmode.

(22) The library function__libc_init_array invokes all C++ static constructors (see also thelinker script). This function is invoked with the BX instruction, whichallows state change to THUMB. This function is harmless in C.

(23) The main() function isinvoked with the BX instruction, which allows state change to THUMB.

(24) The main() functionshould never return in a bare-metal application because there is nooperating system to return to. In case main() ever returns, theSoftware Interrupt exception is entered, in which the user cancustomize how to handle this problem.

Low-Level Initialization
The function low_level_init() performs the low-level initialization,which always strongly depends on the specific ARM MCU and theparticular memory remap operation. As described in the previoussection, the function low_level_init() can be coded in C or C++, butmust be compiled to ARM and cannot rely on the initialization of section, clearing of the .bss section, or on C++ staticconstructors being called.

Listing2 Low-level initialization for AT91SAM7S microcontroller.

Listing 2 above shows thelow-level initialization of the AT91SAM7S microcontroller in C. It isimportant to point out here that the initialization for a differentmicrocontroller, such as AT91x40 series with the EBI, could bedifferent mostly due to different memory remap operation. Thehighlights of the low-level initialization are as follows:

(1) The GNU gcc is astandard-compliant compiler that supports the C-99 standard exact-widthinteger types. The use of these types is recommended.

(2) The arguments oflow_level_init() are as follows: reset_addr is the linked address ofthe reset handler and return_addr is the linked return address from thelow_level_init() function.

Note: In the C++environment, the function low_level_init() must be defined with theextern “C” linkage specification because it is called from assembly.

(3) The symbol __ram_startdenotes the linked address of RAM. In AT91SAM7S the RAM is alwaysavailable at this address, so the symbol __ram_start denotes also theRAM location before the remap operation (see the linker script).

(4) The constant LDR_PC_PCcontains the opcode of the ARM instruction LDR pc,[pc,…], which isused to populate the RAM vector table.

(5) This constant MAGIC isused to test if the remap operation has been performed already.

(6) The number of flash waitstates is reduced from the default value set at reset to speed up theboot process.

(7) The AT91 watchdog timeris disabled so that it does not expire during the boot process. Theapplication can choose to enable the watchdog after the main() functionis called.

(8) The CPU and peripheralclocks are configured. This speeds up the rest of the boot process.

(9) The ARM vector table isestablished in RAM before the memory remap operation, so that the ARMcore is provided with valid vectors at all times. The vector table hasthe following structure:

All entries in the RAM vector table load the PC with the addresslocated in the secondary jump table that immediately follows theprimary vector table in memory. For example, the Reset exception ataddress 0x00 loads the PC with the word located at the effectiveaddress: 0x00 (+8 for pipeline) +0x18 = 0x20, which is the addressimmediately following the ARM vector table.

Note: Some ARM MCUs, suchas the NXP LPC family, remap only a small portion of RAM down toaddress zero. However, the amount of RAM remapped is always at least0x40 bytes (exactly 0x40 bytes in case of LPC), which is big enough tohold both the primary vector table and the secondary jump table.

(10) The jump table entryfor the unused exception is initialized with the MAGIC number. Pleasenote that this number is written to RAM at its location before thememory remap operation.

(11) The secondary jumptable in RAM is initialized to contain jump to reset_addr at 0x20 andendless loops for the remaining exceptions. For example, the PrefetchAbort exception at address 0x0C will cause loading the PC again with0x0C, so the CPU will be tied up in a loop.

This is just the temporary setting until the application initializesthe secondary jump table with the addresses of the application-specificexception handlers. Until this happens, the application is not ready tohandle the interrupts or exceptions, anyway.

Note: Using thesecondary jump table has many benefits. First, the application can veryeasily change the exception handler by simply writing the handler'saddress in the secondary table, rather than synthesize a relativebranch instruction at the primary vector table. Second, the load to PCinstruction allows utilizing the full 32-bit address space forplacement of the exception handlers, whereas the relative branchinstruction is limited to +/- 25 bits relative to the current PC.

(12) The word at theabsolute address 0x14 is loaded and compared to the MAGIC number. Thelocation 0x14 is in ROM before the remap operation, and is in RAM afterthe remap operation. Before the remap operation the location 0x14contains the B . instruction, which is different from the MAGIC value.

(13) If the location 0x14does not contain the MAGIC value, this indicates that the write to RAMdid not change the value at address 0x14. This, in turn, means that RAMhas not been remapped to address 0x00 yet (i.e., ROM is still mapped tothe address 0x00). In this case the remap operation must be performed.

Note: The AT91SAM7 MemoryController remap operation is a toggle and it is impossible to detectwhether the remap has been performed by examining any of the MemoryController registers. The technique of writing to the low RAM addresscan be used to reliably detect whether the remap operation has beenperformed to avoid undoing it. This safeguard is very useful when thereset is performed during debugging. The soft-reset performed by adebugger typically does not undo the memory remap operation, so theremap should not be performed in this case.

(14) The low_level_init()function returns to the address set by the startup code in the lrregister. Please note that at this point the code starts executing atits linked address.

Next, in Part 3: The linkerscript for the GNU toolchain.
To read Part 1, go to “What'sneeded to get started.”

To download the C and C++ sourcecode associated with this article series, go to's Downloadable Code page , or go to Blinky for C and Blinky for C++ to download theZipfiles.

MiroSamek, Ph.D., is president of QuantumLeaps, LLC. He can be contacted at

[1] GNU Assembler (as) HTMLdocumentation included in the CodeSourcery Toolchain for ARM .
[2] IAR Systems, “ARM IAR C/C++Compiler Reference Guide for Advanced RISC Machines Ltd's ARM Cores“,Part number: CARM-13, Thirteenth edition: June 2006. Included in thefree EWARM KickStart edition
[3] Lewin A.R.W. Edwards, “EmbeddedSystem Design on a Shoestring“, Elsevier 2003.
[4] ARMProjects.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.