Migrating ARM7 Code to a Cortex-M3 MCU, Part 2 - Embedded.com

Migrating ARM7 Code to a Cortex-M3 MCU, Part 2

In Part 1 in this series, I dealt with the myriad of details relating to exception vector table formating, startup code/stack configuration, remapping RAM functions, and hardware interrupt configuration that a programmer must be concerned with porting code from an existing ARM7 to the Cortex-M3 core. Now in this second part, the tutorial continues with a discussion of software interrupts, fault handling, the SWP command, instruction time, assembly language, and optimizations.

Software interrupts
The ARM7TDMI allows software to generate an exception via the SWI instruction. This exception is typically used as an interface to system drivers or other privileged code that cannot be called directly. An example usage of the ARM-mode version of the SWI instruction is shown below:

swi   0x123456   

The ARM7TDMI responds by setting R14 to the instruction after the SWI, disabling IRQ, changing to SVC mode and jumping to the SWI exception table vector. ARM-mode SWI requests can be processed with this basic exception handler:

swi_exception_handler:
stmfd sp!, {r10}
ldr r10, {r14, #-4} ; get the SWI instruction
bic r10, r10, #ff000000 ; get the SWI operand from the instruction
; Code to handle event based on operand in r10
ldmia sp!, {r10, pc}^ ; return from handler

The Cortex-M3 has a similar mechanism using the SVC instruction however the handler is different because the NVIC automatically stacks R0-R3, R12, LR, PC, and PSR. The handler must first determine whether the Main or Process stack was used in order to access the SVC operand and any other parameters that might be passed.

svc_exception_handler:
tst lr, #4
ite eq
mrseq r0, MSP
mrsne r0, PSP
ldr r1, [r0, #24] ; stacked PC
ldrb r1, [r1, #-2] ; get the operand from the SVC instruction
; Code to handle SVC based on operand in r1
bx lr ; return from handler

Another difference in the two software interrupt implementations is that while an SWI exception handler is allowed to invoke another SWI exception, the Cortex-M3 NVIC cannot respond to an exception with the same priority as what is currently executing (attempting to do so will trigger a usage fault).

Dealing with hand-coded assembly
In the 32-bit processor world, hand-coded assembly is often only used for operations that a high-level language either cannot perform directly (e.g., manipulating processor-specific registers) or is too slow. The Cortex-M3 has eliminated the need for much of the former and what remains (discussed below) can usually be encoded as inline assembly.

Code written in ARM-mode assembly for performance reasons will require either a rewrite into a high-level language or a manual translation into the Cortex-M3 Thumb/Thumb-2 instruction set. Using a high-level language is obviously easier to maintain and in many instances the compiler generates code as good as that generated by hand. However, if hand-coded assembly happens to be preferred, many ARM-mode instructions have Thumb-2 equivalents.

One ARM-mode feature often utilized is the conditional execution of an instruction to avoid the penalty of branching around it. The Cortex-M3 provides a similar capability with the “IT” (if-then) instruction which will conditionally execute the following one to four instructions based on whether a comparison is true or false. (In situations with more than four instructions conditional on a single comparison, an actual branch instruction must be used.) As an example, the following ARM assembly will clear eight bytes to the address in either register R2 or R3 depending on R1 being equal to 0 or not:

    cmp   r1, #0    streq r0, [r2, #0]   ; if r1 is 0, write to first 4 bytes in r2    streq r0, [r2, #4]   ; if r1 is 0, write to second 4 bytes in r2    strne r0, [r3, #0]   ; if r1 is not 0, write to first 4 bytes in r3    strne r0, [r3, #4]   ; if r1 is not 0, write to second 4 bytes in r3   

The equivalent code on the Cortex-M3:

    cmp   r1, #0    ittee eq    streq r0, [r2, #0]   ; if r1 is 0, write to first 4 bytes in r2    streq r0, [r2, #4]   ; if r1 is 0, write to second 4 bytes in r2    strne r0, [r3, #0]   ; if r1 is not 0, write to first 4 bytes in r3    strne r0, [r3, #4]   ; if r1 is not 0, write to second 4 bytes in r3   

This form of the IT instruction, ittee eq , causes the two instructions following it to execute only if the 'eq' condition is true and the two instructions after that to execute only if the 'eq' condition is false. The Cortex-M3 Technical Reference Manual has more details on the usage of the IT instruction.

Some instructions can be encoded as either 16-bit Thumb or 32-bit Thumb-2. The encoding used can be selected by adding a suffix to the instruction: “.n” for 16-bit Thumb (narrow) or “.w” for 32-bit Thumb-2 (wide). If unspecified, the assembler will typically encode for 16-bit Thumb.

Fault handlers
Both the ARM7TDMI and Cortex-M3 have special fault exceptions that it will trigger if a problem is encountered during a memory access or while processing an instruction. These faults usually indicate that either hardware or software has failed and since recovery is unlikely, a typical fault handler will simply halt after logging the state of the processor so that the problem can be addressed later.

An important item to log is the address of the instruction that was being executed when the fault occurred. This along with a disassembly of the code, the contents of the processor registers and possibly a portion of the stack frame is often enough information to pinpoint what went wrong. On the ARM7TDMI, the executing instruction can be found by subtracting four from the link register (LR) for Undefined Instruction and Prefetch Abort exceptions and by subtracting eight from the LR for Data Aborts. With the Cortex-M3, the program counter at the time of the fault is pushed onto the stack as with most exception events and can be extracted with the following code:

    tst   lr, #4    ite   eq    mrseq r0, MSP    mrsne r0, PSP                              ; check that r0 is a valid stack                              ; pointer to avoid a second fault    tst   r0, #3    bne   skip    ldr   r1, =STACK_ADDRESS_MIN    cmp   r0, r1    bmi   skip    ldr   r1, =STACK_ADDRESS_MAX - 32    cmp   r0, r1    bpl   skip    ldr   r1, [r0, #24]       ; r1 <= stacked PCskip:   

The Cortex-M3 has the following fault exceptions:

• Usage fault–for undefined instructions or certain unaligned accesses.

• Memory management fault–for attempts to access unprivileged memory.

• Bus fault–for accessing invalid or offline memory regions.

• Hard fault–for when the above fault exceptions cannot run.

The hard fault exception has a fixed priority level (higher than any user configurable level) and is always enabled. The other fault exceptions have a user configurable priority level and must be enabled before being used. If a fault event occurs for a disabled fault handler or if the handler has too low of a priority to run, a hard fault will be triggered. For many basic systems, only a hard fault handler is necessary to catch software errors.

The Cortex-M3 has several registers to help diagnose fault conditions:

• Usage Fault Status Register (UFSR)

• MemManage Fault Status Register (MMSR)

• Bus Fault Status Register (BFSR)

• MemManage Fault Address Register (MMAR)

• Bus Fault Address Register (BFAR)

The three status registers (UFSR, MMSR and BFSR) can all be read as a single 32-bit word called the Combined Fault Status Register (CFSR). The MMAR and BFAR registers contain the address that caused their respective faults if the MMARVALID or BFARVALID bit is set in MMSR or BFSR. The following code reads the CFSR and the appropriate Fault Address Register:

    ldr   r0, =0xE000ED28    ldr   r1, [r0, #0]   ; r1 <= CFSR    tst   r1, #0x80      ; MMARVALID set?    it    ne    ldrne r2, [r0, #12]  ; r2 <= MMAR    tst   r1, #0x8000    ; BFARVALID set?    it    ne    ldrne r2, [r0, #16]  ; r2 <= BFAR   

Some bus faults may not occur until several instructions have executed after the faulting instruction (for example, an STR instruction that uses the write buffer). This case will be indicated by the IMPRECISERR bit in BFSR.

The SVC instruction cannot be used in a hard fault handler. Since the SVC exception is always a lower priority than the hard fault handler, attempts to trigger it will result in a second hard fault. The Cortex-M3 responds to a double hard fault by entering a “locked” state where only a Reset, NMI, or intervention with a debugger can resume execution.

The SWP instruction
The ARM7TDMI included a SWP instruction that provided an atomic read-then-write to a memory location. A common use of SWP is in the implementation of operating system semaphores to provide mutual exclusion between tasks.

take_semaphore:    ldr   r0, =semaphore_addr    mov   r1, #1    swp   r2, r1, [r0]     ; Set the semaphore to 1    cmp   r2, #1           ; Was it already set by another task?    beq   take_semaphore   ; Yes, try againgive_semaphore:    ldr   r0, =semaphore_addr    mov   r1, #0    str   r1, [r0]         ; Write a 0 to semaphore to give it back   

The Cortex-M3 does not have the SWP instruction although the semaphore functionality can be implemented with the load exclusive (LDREX) and store exclusive (STREX) instructions.

take_semaphore:    ldr   r0, =semaphore_addr    ldrex r1, [r0]    cbnz  r1, take_semaphore   ; Another task has the semaphore                               ; Try again    mov   r1, #1    strex r2, r1, [r0]         ; Try setting the semaphore to 1    cbnz  r2, take_semaphore   ; Another task set the semaphore                               ; Try againgive_semaphore:    ldr   r0, =semaphore_addr    mov   r1, #0str    r1, [r0]                   ; Clear the semaphore to 1   

Instruction timing
The Cortex-M3 will pipeline LDR and STR instructions when possible allowing subsequent instructions to begin executing before the previous one completes. This behavior is normally desirable as it increases overall execution speed, however it can also potentially cause any assembly code tuned for a precise timing to be off.

For example, take the case of creating a pulse on a PIO pin by writing to the SoC peripheral registers associated with setting a pin high and low:

    ldr   r0, =pio_set_regldr    r1, =pio_clear_reg    mov   r2, #1    str   r2, [r0]   ; set pin high    str   r2, [r1]   ; set pin low   

On the Cortex-M3, this code creates a pulse whose width is two system clocks long (the execution time of the second STR instruction). A reasonable assumption would be that the addition of a NOP instruction between the two STR instructions would make the pulse one clock period longer but the pulse remains only two clocks wide because the NOP is actually executed during the second cycle of the STR instruction. Adding a second NOP instruction will lengthen the pin pulse by one clock period.

Optimizations
After the initial port is complete and the application is functioning, it makes sense to investigate the new features that the Cortex-M3 has to offer and how they might help increase the performance of the application.

The Bit Band
Past ARM instruction sets only provide accesses to memory in units of bytes (8-bit), half-words (16-bit) or words (32-bit). Modifying individual bits in memory requires three steps:

• Reading unit of memory into a general register,

• Perform logical operations on the register to manipulate the desired bits, and

• Writing the unit of memory back out.

One major drawback to this method is that it is not atomic. If the thread is interrupted by an ISR that writes to the same memory unit, the memory will be corrupted by the resumed thread. To make this operation atomic, interrupts would need to be disabled before the memory read and then re-enabled after the memory write. That's at least five operations to atomically set or clear a single bit.

The Cortex-M3 provides a mechanism to modify individual bits in memory in a simple and atomic way. Basically, a single word access within a special 32MB portion of the SRAM and Peripheral address regions is handled as an individual bit access to a word in the first 1MB of the region. For example, writing a '1' to address 0x22000000 will set bit 0 of the word at 0x20000000.

static unsigned int x;unsigned int *p = (unsigned int *)(((unsigned int)&x & 0xf0000000) +                  0x02000000 + (((unsigned int)&x & 0x000ffffc) * 32));x = 0;p[0] = 1;              /* set bit 0 in x */p[1] = 1;              /* set bit 1 in x */p[31] = 1;             /* set bit 31 in x */if (p[0]) p[30] = 1;   /* set bit 30 in x because bit 0 is set *//* x now equals 0xc0000003 */   

Conclusion
ARM's Cortex-M3 processor is a true real time core that overcomes real time processing limitations of the ARM7TMI core. Migrating ARM7 code to the Cortex-M3 is generally a straightforward process that can be done in just a few days in most situations. Developing the new exception vector table format, startup code/stack configuration and new hardware interrupt configuration can be completed in a matter of hours for all but the most sophisticated applications. Hand-coded assembly is often best dealt with by re-writing it in C language as modern compilers create exceptional code.

The most influential factor on the porting effort is the particular Cortex-M3 device chosen as this determines whether peripheral driver code can be reused or not. Special attention to selecting a device with software-compatible peripherals will ensure a quick and painless migration to this powerful modern microcontroller core.

After receiving a BSEE from the University of Texas in 1992, Todd Hixon has spent most of his career developing hardware and software for various microcontroller-based products, eventually specializing in network device drivers for a major DSL modem manufacturer. He now works for Atmel where he provides specialized software solutions for Atmel's AT91 family of ARM microcontrollers.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.