Building advanced Cortex-M3 applications

Jean J Labrosse, Lotta Frimanson and Anders Lundgren

April 08, 2009

Jean J Labrosse, Lotta Frimanson and Anders LundgrenApril 08, 2009

The ARM Cortex-M3 architecture provides many improvements compared with its predecessor, the popular ARM7/9, and is designed to be particularly suitable for cost-sensitive embedded applications that require deterministic system behavior.

This article describes how developers can best utilize the advanced capabilities of the Cortex-M3 when designing embedded applications.

Comparing ARM7/9 to Cortex-M3
Cortex-M3 is a member of the Cortex-M family, one of the three ARM Cortex architectures that were introduced to the embedded marketplace in 2004, and is being integrated into low-cost embedded microcontrollers (MCUs) from an increasing number of silicon vendors.

A comparison of the main characteristics of Cortex-M3 with those of ARM7/9 is shown in Table 1 below.

Table 1: Comparison of ARM7/9 and Cortex-M3 characteristics

The Cortex-M3 improves on the ARM7/9 in most qualitative estimates " simpler stack architecture, better interrupt controller, and higher-performance instruction set, as well as enhanced debug capabilities, all of which can significantly affect end-product performance.

Stacking and Interrupts
The Cortex-M3 reduces both the overhead and complexity of ARM7/9 stack management by incorporating only two stacks(Figure 1, below). Tasks execute in Thread mode, using the process stack, while interrupts execute in Handler mode, using the main stack.

The task context is automatically saved on the process stack when an exception occurs, upon which the processor moves to Handler mode, making the main stack active. On return from the exception, the task context is restored and Thread mode re-instated if the interrupted task remains the active task.

If, however, a new task is to be scheduled, the context switch must take place. Because the task context is already saved, this procedure is more straight- forward with the Cortex-M3 and also consumes 50% fewer processor cycles.

Figure 1: Cortex-M3 task switching

Migration between processors
The Cortex-M3 includes several integrated peripherals in addition to the core CPU. Most important of these is the Nested Vectored Interrupt Controller (NVIC), designed for low latency, efficiency and configurability.

The NVIC saves half the processor registers automatically upon interrupt, restoring them upon exit, allowing for efficient interrupt handling. It also removes the need for saving/restoring registers during back-to-back interrupts. The NVIC also integrates the SysTick, a 24-bit down-counting timer intended for RTOS use.

The NVIC and SysTick peripherals ease the migration between Cortex-M3 processors, particularly when an RTOS is used, as it simply requires a function that returns the clock frequency on which the SysTick timer is based.

In contrast, an RTOS port to an ARM7TDMI-S processor would require a port to the interrupt controller of the processor and a port to a hardware timer in addition to the generic ARM port. The interrupt functionality, which must be written individually for each ARM7/9 port, is provided just once for all Cortex-M3 implementations.

The sleep mode feature of the Cortex-M3 can be used to conserve power when the target application is idle. For example, with µC/OS-II the idle task calls an application-level hook that causes the processor to enter sleep mode until the next interrupt is received. Unlike most previous ARM processors, the Cortex-M3 also has a fixed memory map.

Instruction set
The Cortex-M3 implements ARMv7-M, using the Thumb-2 Instruction Set Architecture (ISA) " a superset of the original Thumb " which includes new 16- and 32-bit instructions. Cortex-M3s always execute in a single mode (Thumb-2), unlike ARM7/9s that needed to switch between ARM/Thumb modes.

The Cortex-M3 includes 36 instructions not available on the ARM7/9, including CLZ (count leading zeros), which is particularly useful for kernel scheduling algorithms. An optimized version of the scheduling algorithm for µC/OS-II written in assembly language using the CLZ and RBIT instructions can be used to find the highest priority ready task efficiently within about 25 clock cycles " about twice as fast as the equivalent optimization that can be done with an ARM7/9 and, 3-4 times faster than the same algorithm written in C.

Debug and trace
The ARM7TDMI-S cores have only two hardware watchpoints " translating to either two code breakpoints or one code breakpoint and a data breakpoint " and no live core access. Ideally multiple breakpoints need to be activated to pinpoint a badly behaving application.

In contrast the Cortex-M3, which contains a sub-set of the new ARM Coresight debug technology has 6 code breakpoints and 4 general-purpose watchpoints, providing enough breakpoints for most debugging scenarios. The Cortex-M3 also allows live access to the core when the application is running, making it possible to read and write memory and set/clear breakpoints on a running application.

< Previous
Page 1 of 2
Next >

Loading comments...