Fundamentals of Booting for Embedded Processors -

Fundamentals of Booting for Embedded Processors


One of the essential operations of any embedded processor is the booting process. However, despite the apparent simplicity of the boot process, the options associated with booting a processor can actually be quite complex, in order to allow for the greatest flexibility in an application.

There are multiple ways to boot, multiple purposes behind booting, and multiple factors that affect the process. This article will explore these issues in an attempt to deliver a clearer overall picture of what is entailed by the embedded booting process.

First, the option not to boot
Processors that do not have a specific boot ROM usually jump to a memory location in an external memory device and start executing instructions. This external memory location is generally fixed, and the execution begins when the processor transitions out of the reset sequence.

In these processors, code and data are already programmed into an external device, such as a parallel NOR flash. The only timing constraints relate to the intervals after power-up sequences to ensure that the flash is ready to be accessed at least as quickly as the processor is ready to make an access.

Of course, execution out of this external memory is slower than running code from faster internal memory because the flash memory runs at a clock speed that is typically much lower than the speed at which the processor's core runs.

If the code is simply executed in place (“XIP”) from flash, enabling instruction cache can significantly increase the speed of execution. This is especially true when burst flash is used, because the synchronous access patterns of these devices are friendly to the typical cache-line fill sizes of embedded processors.

So why boot?
While this method of starting a processor's execution is common, it constrains the code storage options of a system. For example, a NOR flash will cost more than a commensurate serial SPI-based device, but the NOR flash provides faster access than does the serial device.

In addition, there may already be a storage device such as a NAND flash in the design that could be used to store the application code as well as data. Processors do not typically run code from these other types of devices, but they can often access data from them. From a bill-of-materials (BOM) standpoint, it is cheaper to have a single storage device that serves multiple purposes.

Because of this, the first code that is executed is often a small code segment that is used to set up the transfers needed to bring the remaining code into internal memory space, where it can then be executed at the core processor frequency.

When the transfers are complete, the processor then jumps to the start of the internal memory space where it executes the application code that was just transferred.

Boot ROM
To provide more flexibility in booting, many processors include a multi-Kbyte “Boot ROM” on chip that includes code that the processor vendor develops and burns into the ROM. As we'll see, the ROM code can perform many different functions.

One of the first tasks the ROM performs is to establish which boot mode has been selected. This is usually determined by reading the state of pins that have been tied high or low. These may be dedicated “Boot Mode Pins” or multipurpose I/O, depending on the processor.

The ROM code reads the pin state and figures out which peripheral will be used to bring in the code and data. The ROM code will then proceed to setup the peripheral interface, including the programming of all required registers, to make the transfer happen.

Depending on the interface, core accesses or a DMA transfer will be used to read code or data from the external memory device and place this data in a specified location in internal or external memory.

The ROM can also be responsible for setting default values of some important system parameters pertaining to memory initialization, interrupt handling and reset behavior. Because the ROM must be programmed to operate within a wide variety of system situations, it often uses only the “safest” values for key configurations like system and peripheral clock settings.

A series of headers usually “frame” the data on the memory device. The ROM first reads the header and then decodes it to decide how to proceed. These headers usually include parameters such as the number of bytes to be moved and the destination address of the transfer.

Other configuration information that can be contained in the header includes bits to indicate the processor should perform tasks such as memory fills or system initialization.

For memory fills, the processor writes a given value to sections of memory, which provides a useful way to clear memory at startup. System initialization tasks range from initializing external memory to establishing communication with an external device.

One of the other useful header features is the ability to load a specific image based on certain board-level hardware. For example, a single product may have multiple configurations, from low-end to high-end.

As such, the flash may include multiple images to allow identical hardware to behave in different ways. The booting header can be used to select the desired executable out of this code store.

As with RAM, the boot ROM can be mapped at any memory level that the processor supports. Typically, these ROMs are located either in L1 memory — where instruction execution occurs in a single core clock cycle — or in L3 memory, where execution occurs in the slower system clock domain. If a larger ROM is required, it is most often at the L3 level. If speed of execution is important, an L1 ROM is used.

Second-stage boot loader
In some cases, the flexibility can be extended by the use of a “2nd stage” loader. The second stage loader is simply code that is booted in by the boot ROM.

This code is then used to setup the system and bring in the remaining code. It may also perform system initialization, or in many cases it may continue the boot process via a peripheral that is not natively supported by the boot ROM.

For example, peripherals that require a protocol may be difficult to implement in a boot ROM, but they may be better suited to configuration via an inexpensive serial EEPROM.

The simplest type of boot ROM may just look for a fixed size code block from external memory. This fixed size block almost always serves as a 2nd stage loader.

One excellent example of a 2nd stage loader is Das U-Boot, an open source, universal boot loader. It is a small segment of software that is brought in from external memory and executes soon after powering up a processor.

Since embedded systems do not have a BIOS to perform the initial system configuration, the low level initialization of microprocessors, memory controllers, and other board-specific hardware varies from board to board and from CPU to CPU. These initializations must be performed before a Linux kernel image can execute.

Specifically, the embedded loader kernel initializes the hardware, including the memory controller. It also provides the boot parameters for the Linux kernel, and then it starts the Linux kernel.

Some additional U-Boot features include reading and writing arbitrary memory locations, uploading new binary images to the board's RAM via a serial line or Ethernet, and copying binary images from RAM to FLASH memory.

Booting Options (Master vs Slave)
Today's embedded processors provide many boot options, and there are usually multiple choices that suit a developer in a given application. Sometimes a processor needs only the code inside the Boot ROM in order to complete the bootstrapping process.

In other cases, the processor has on-chip non-volatile memory (i.e., ROM or Flash) as part of its memory map. In these situations, the Boot ROM can directly vector to this on-chip memory and begin execution of application code immediately.

However, in a large number of cases, a processor must rely on external devices from which to boot . By “external,” we mean that a processor must use a pin interface (such as a parallel or serial port) in order to access the external device.

These situations distill into two main categories: those in which the processor is a master, and those where it is a slave. When the processor is a master, it controls access (via clock and synchronization signals) to an external memory or other device. The most common master examples include interfaces to external parallel or serial (SPI or I2C) memories.

On the other hand, when the processor is a slave, it is booted from an external master, usually through a parallel host port or ubiquitous serial connection (e.g., UART, SPI, USB).

Figure 1: Steps in creation of bootable image

Configuring a System for Boot
There are many steps involved in converting user source code into bootable form, such that it can be booted in by the Boot ROM and executed on the target processor.

Figure 1 above shows an example of such a sequence, in which the Boot ROM is configured to read the boot image from external memory. First, the source files for the image are assembled, compiled and linked. After this, they are passed through a Loader utility that parses the linked input file(s) and creates a loader file that interleaves small information headers with the code/data blocks they describe. It is this loader file that is programmed into external memory.

Figure 2: Parsing of the boot header

When the processor's Boot ROM, configured for “Boot from external memory,” accesses this memory, it first reads each header section before reading the corresponding blocks they describe.

As shown in Figure 2, above , each header contains, for example, the destination address for storing the code or data block, a count of the number of bytes to transfer for this block, and control commands that provide the Boot ROM any additional information about how to treat the block.

David Katz is Blackfin Applications Manager for New Product Development at Analog Devices, Inc. He is co-author of Embedded Media Processing (Newnes 2005). Previously, he worked at Motorola, Inc., as a senior design engineer in cable modem and factory automation groups. David holds both a B.S. and an M. Eng. in Electrical Engineering from Cornell University. He can be reached at .

Rick Gentile joined ADI in 2000 as a Senior DSP Applications Engineer, and he currently leads the Blackfin DSP Applications Group. Prior to joining ADI, Rick was a Member of the Technical Staff at MIT Lincoln Laboratory, where he designed several signal processors used in a wide range of radar sensors. He received a B.S. in 1987 from the University of Massachusetts at Amherst and an M.S. in 1994 from Northeastern University, both in Electrical and Computer Engineering. He can be reached at .

Sources and Further Reading
[1] ADSP-BF533 Blackfin Booting Process (EE-240). Rev 4, September 2008. Analog Devices, Inc.
[2] Estimating and Optimizing Boot Time for Blackfin Processors. Rev 1, December 2006. Analog Devices, Inc.
[3] ADSP-BF537 Blackfin Processor Hardware Reference. Revision 3.1, March 2009. Analog Devices, Inc.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.