The ARM Cortex-M3 core has enhancements to its architecture that result in increased code execution speed, lower power consumption, and easier software development (Table 1 ). The result is a true real-time core that overcomes real-time processing limitations of the ARM7TMI core. Over time, most ARM7-based designs will be migrated to the Cortex-M3.
Although ARM has done a lot to make it easy to port legacy code from the ARM7 to the Cortex-M3 core, more remains to be done. The purpose of this two-part article is to take you step-by-step through the porting process, so you will have no excuses when your boss asks to you to port some legacy code and have it ready by last Wednesday.
One of the most helpful things ARM has done is to make sure that Cortex-M3 support has been added to every ARM tool chain, which makes code compilation a straightforward process that can be done in just a few days in most situations. In fact, the most important consideration when migrating a legacy ARM7 design to the Cortex-M3 is selecting a device with peripheral hardware that is identical to that on the ARM7 in the current design. If the programming interface is different, new peripheral drivers will be required. This effort could add days or weeks to the schedule (never a good thing).
Using an M3 with identical peripheral hardware enables the software engineer to reuse most (if not all) of his C language driver code, saving days or even weeks of learning the nuances of new peripherals often associated with developing a robust driver from point zero. Vendor-supplied header files handle any relocation of peripheral register addresses–the developer simply includes the file for the Cortex-M3 device and recompiles the code.
There are, however, several differences between the Cortex-M3 and the ARM7TDMI that engineers must address (shown in Table 2 ) in their designs. Initially in this article, I'll explore the issues that arise in dealing with exception vector table formating, startup code/stack configuration, RAM functions remapping, and hardware interrupt configuration. Part 2, available online at Embedded.com, will address software interrupts, fault handling, the SWP (setwatchprops) command, instruction time, assembly language, and optimizations.
The Exception Vector Table
The exception vector table is where the application code tells the processor core the location of software routines to handle various asynchronous events. For ARM cores, these events include Reset (triggered by a power-up or hard reset), faults and aborts due to bus errors or undefined instructions, and interrupts triggered by either software requests or external sources such as on-chip peripherals.
For an ARM7TDMI, the exception vector table typically consists of at least six1 branch2 instructions in hand-coded assembly:
b Reset_Handler b UndefInstr_Handler b SWI_Handler b PrefetchAbort_Handler b DataAbort_Handler b . ; Reserved vector b IRQ_Handler b FIQ_Handler
The exception vector table on the Cortex-M3 can be defined in C as an array of pointers (see Listing 1 ). The first entry is the address of the stack and the remaining entries are pointers to various exception handler functions.
The ARM7TDMI has seven processor modes, six of which have their own stack pointer. One of the seven modes, User , operates at a lower privilege level than the others. The Cortex-M3, on the other hand, has only two modes: Thread and Handler. Thread mode can operate at either an elevated privilege level or a user level and can use either the main stack or the process stack. Handler mode always operates at privilege level with the main stack.
Elevated privilege levels allow access to the processor status registers (CPSR and SPSR on the ARM7TDMI; APSR on the Cortex-M3) and possibly restricted memory regions as dictated by an optional memory protection unit (MPU). Table 3 shows equivalent levels between the ARM7TDMI and the Cortex-M3.
Configuring the processor mode stacks
All but the simplest ARM7TDMI systems use at least two processor modes:
• SVC (Supervisor Calls) for initialization and possibly main loop code, and
• IRQ (Interrupt Request) for interrupts.
Each of the used modes must have their corresponding stack pointers initialized at reset, which requires assembly code shown in Listing 2 .
At reset, the Cortex-M3 automatically assigns its main stack pointer (MSP) to the first entry in the exception vector table (TOP_STACK in Listing 1) and then jumps to the routine pointed to by the second entry, System_Init . Since the MSP used by the IRQ handlers can also be used by the main code, the Cortex-M3 can run many types of applications without any assembly code to initialize stack pointers.
The management of interrupts from on-chip and off-chip peripherals is an important feature of microcontrollers with the most important performance metrics being latency (the time between when the event occurs and when software handles it) and jitter (how much the latency varies from event to event). Several features of the Cortex-M3 improve both latency and jitter compared with the ARM7TDMI as well as greatly simplifying the software needed to handle the interrupts.
Since the ARM7TDMI has only two general-purpose exception inputs, the IRQ and the FIQ (Fast Interrupt Request), most system-on-chip (SoC) vendors include an interrupt controller to multiplex the multitude of interrupt sources down to a single IRQ or FIQ assertion. The IRQ/FIQ exception handler must determine which interrupt source to process and call the appropriate software routine. Another interrupt cannot be serviced until the routine completes and returns to the interrupted code, often making latency and jitter unacceptable for certain events. The obvious solution is to prioritize the events and allow those of higher priority to preempt those of lower priority. Implementing this on the ARM7TDMI involves an interrupt service routine (ISR) “wrapper” in assembly code (see Listing 3 ) that saves the processor status on the stack, changes the processor mode from IRQ back to SVC, re-enables the IRQ, and finally calls the event handler. When the handler returns, the saved mode is restored. Because the handler is called when the processor is in SVC mode, the IRQ can be asserted again by a higher priority interrupt event.
With the Cortex-M3, this wrapper code is no longer required because of the inclusion of the nested vectored interrupt controller (NVIC) . The NVIC is similar to the interrupt controllers that SoC vendors typically include with an ARM7TDMI device; however, since it's integrated with the processor core, the NVIC can perform more sophisticated actions. The ARM7TDMI ISR wrapper code is now essentially done in hardware!
When an interrupt occurs that is a higher priority than is currently executing, the NVIC will automatically save the registers required for a call to a function compliant with the ARM Architecture Procedure Call Standard (AAPCS) and restore the registers when the function completes. Chances are the C compiler uses AAPCS, which means that the NVIC can call C functions directly. Therefore, the vector_table array in Listing 1 can contain pointers to C functions.
The NVIC determines which exception to handle based on the source's priority designation. The Reset, NMI (non-maskable interrupt), and Hard Fault exceptions are fixed at the first, second, and third highest priorities. The remaining exception sources have user-configurable priority levels specified by their preempt priority and subpriority . If an exception occurs with a higher preempt priority than what is currently executing, the handler for the new exception will be called. Otherwise, the new exception will be pended until all higher priority exceptions have completed. If multiple exceptions are pended within a particular preempt-priority level, the NVIC will handle them in order of their subpriority. Sources with the same subpriority level will be handled in the order of their NVIC source number.
Note that with the preempt priority and subpriority fields, smaller numbers represent higher priority, with “0” being the highest.
Table 4 shows anexample IRQ priority configuration and what its effects will be.
Table 5 shows how the bits of the priority level are split between the preempt priority (“pre” column) and the subpriority (“sub” column) based on the particular SoC priority-level register size (columns ranging 3 to 8) and the priority group setting (rows ranging 0 to 7).
For example, if the priority-level register size is 4 bits, a priority group setting of 4 will cause bits [7:5] to be used as the preempt priority level and bit  to be used as the subpriority. The remaining bits [3:0] are unused. In this case, an exception with priority 0x20 will preempt one with 0x40 (lower value is higher priority). If exceptions with priorities 0x40 and 0x50 occur while exception 0x20 is being serviced, they will be pended (as described earlier), and the 0x40 exception will run before the 0x50 exception because the former has a higher subpriority (bit 4 is 0 in 0x40 and is 1 in 0x50; 0 is higher priority than 1).
If it isn't exactly clear how the levels should be partitioned for a particular application, a reasonable starting point is to select priority group “0,” which will make all of the levels preemptive (an 8-bit group register will only have 128 preemptive levels).
The following code sets the priority group in the NVIC:
#define NVIC_AIRCR (*((unsigned int *)0xE000ED0C))NVIC_AIRCR = (0x05fa << 16) | /* Access key */ (0 << 8); /* Priority Group 0 */
All of the system exceptions except for the first three (Reset, NMI, and Hard Fault) have user-configurable priority levels.
#define CM3_SHPR ((unsigned char *)0xE000ED18)NVIC_ICPR[num - 3] = priority;
Peripheral interrupts are configured based on their IRQ number (IRQn), as shown in Listing 4 .
RAM remap function
ARM7TDMI SoC vendors commonly provide a mechanism in their memory controllers to select whether a nonvolatile memory or a volatile memory appears at the reset vector (typically address 0x0). This allows an application to start with a fixed set of vectors in nonvolatile memory and then switch to a different set of vectors later by setting up a new vector table in RAM and “remapping” the RAM to address 0x0.
With the Cortex-M3, this memory controller sleight of hand is no longer necessary as the NVIC allows the exception vector table to be located practically anywhere in memory. Using a new vector table is as simple as writing its offset from the beginning of its address space (either “code” or “SRAM”) into the NVIC vector table offset register, as shown in Listing 5 .
One requirement for the vector table is that it be aligned on the total number of entries rounded up to the next power of two words. For example, if the SoC vendor used 30 IRQ sources, the total number of entries including the 16 system entries would be 46. The next power of two up from 46 is 64 and the alignment for 64 4-byte words would be on a 256-byte boundary. The alignment of the vector table array can usually be specified with a toolchain-specific directive.
The Cortex-M3 has no direct equivalent to the ARM7TDMI FIQ interrupt. While the Cortex-M3's non-maskable interrupt (NMI) may seem like a tempting substitute for the FIQ, the NMI is missing the key feature of the FIQ: the ability to preload data into shadowed registers. As the Cortex-M3 doesn't shadow general-purpose registers, the most suitable FIQ replacement is just a normal IRQ with a higher priority assignment, if needed.
Executing from RAM
A common method of increasing the execution speed of critical code on ARM7TDMI devices is to execute it from internal SRAM (typically copied automatically there when tagged with a toolchain-specific “ramfunc” directive), taking advantage of the SRAM's faster access time (often zero wait-states vs. one or more wait-states with flash memory). The Cortex-M3 was designed for highest performance when executing from “code” memory (commonly internal flash) and execution from internal SRAM will be much slower since a single bus (the system bus) will be used for both instructions and data. The highest performance is realized when instructions are in the “code” memory region allowing the Cortex-M3 to perform simultaneous code and data accesses with use separate buses.
Similarly, the exception vector table should also be located in the “code” memory area. This allows the registers to be stacked to RAM at the same time that the exception vector is read.
Occasionally, an application may need to temporarily disable all processor interrupts. On the ARM7TDMI, the vendor-supplied interrupt controller may provide a global disable register or the application may set the current processor status register “I” bit (and perhaps the “F” bit) with the assembly code in Listing 6 .
On the Cortex-M3, a special PRIMASK register disables all interrupts except the NMI and fault exceptions:
disable_irq: mov r0, #1 msr PRIMASK, r0 enable_irq: mov r0, #0 msr PRIMASK, r0
But there is still more to do, if you want to get this job done by last Wednesday. To read about porting issue related to software interrupts, fault handling, the SWP command, instruction time, assembly language and optimizations, go on line and read Part 2: “An ARM programmer’s work is not yet done.”
1. The FIQ handler can simply begin at offset 0x1c instead of a branch.
2. PC-relative LDR instructions can be used instead of branches for long jumps.
After receiving a BSEE from the University of Texas in 1992, Todd Hixon has spent most of his career developing hardware and software for various microcontroller-based products, eventually specializing in network device drivers for a major DSL modem manufacturer. He now works for Atmel where he provides specialized software solutions for Atmel's AT91 family of ARM microcontrollers.