The ARM Cortex-M3 architecture provides many improvements compared with its predecessor, the popular ARM7/9, and is designed to be particularly suitable for cost-sensitive embedded applications that require deterministic system behavior.
It's a member of the Cortex-M family, one of the three ARM Cortex architectures that were introduced to the embedded marketplace in 2004, and is being integrated into low-cost embedded microcontrollers (MCUs) from an increasing number of silicon vendors.
A comparison of the main characteristics of Cortex-M3 with those of ARM7/9 is shown in Table 1 below . The Cortex-M3 improves on the ARM7/9 in most qualitative estimates ” simpler stack architecture, better interrupt controller, and higher-performance instruction set, as well as enhanced debug capabilities, all of which can significantly affect end-product performance.
|Table 1. Comparison of Cortex-M3 and ARM7/9|
Stacking and Interrupts
The Cortex-M3 reduces both the overhead and complexity of ARM7/9 stack management by incorporating only two stacks. Tasks execute in Thread mode, using the process stack, while interrupts execute in Handler mode, using the main stack.
The task context is automatically saved on the process stack when an exception occurs, upon which the processor moves to Handler mode, making the main stack active. On return from the exception, the task context is restored and Thread mode re-instated if the interrupted task remains the active task.
If, however, a new task is to be scheduled, the context switch must take place(Figure 1 below . Because the task context is already saved, this procedure is more straightforward with the Cortex-M3 and also consumes 50% fewer processor cycles.
|Figure 1.Cortex-M3 Task Switching|
Migration between processors
The Cortex-M3 includes several integrated peripherals in addition to the core CPU. Most important of these is the Nested Vectored Interrupt Controller (NVIC), designed for low latency, efficiency and configurability.
The NVIC saves half the processor registers automatically upon interrupt, restoring them upon exit, allowing for efficient interrupt handling. It also removes the need for saving/restoring registers during back-to-back interrupts. The NVIC also integrates the SysTick, a 24-bit down-counting timer intended for RTOS use.
The NVIC and SysTick peripherals ease the migration between Cortex-M3 processors, particularly when an RTOS is used, as it simply requires a function that returns the clock frequency on which the SysTick timer is based.
In contrast, an RTOS port to an ARM7TDMI-S processor would require a port to the interrupt controller of the processor and a port to a hardware timer in addition to the generic ARM port. The interrupt functionality, which must be written individually for each ARM7/9 port, is provided just once for all Cortex-M3 implementations.
The sleep mode feature of the Cortex-M3 can be used to conserve power when the target application is idle. For example, with C/OS-II the idle task calls an application-level hook that causes the processor to enter sleep mode until the next interrupt is received. Unlike most previous ARM processors, the Cortex-M3 also has a fixed memory map.
The Cortex-M3 implements ARMv7-M, using the Thumb-2 Instruction Set Architecture (ISA) ” a superset of the original Thumb ” which includes new 16- and 32-bit instructions. Cortex-M3s always execute in a single mode (Thumb-2), unlike ARM7/9s that needed to switch between ARM/Thumb modes.
The Cortex-M3 includes 36 instructions not available on the ARM7/9, including CLZ (count leading zeros), which is particularly useful for kernel scheduling algorithms.
An optimized version of the scheduling algorithm for C/OS-II written in assembly language using the CLZ and RBIT instructions can be used to find the highest priority ready task efficiently within about 25 clock cycles ” about twice as fast as the equivalent optimization that can be done with an ARM7/9 and, 3-4 times faster than the same algorithm written in C.
Debug and trace
The ARM7TDMI-S cores have only two hardware watchpoints ” translating to either two code breakpoints or one code breakpoint and a data breakpoint ” and no live core access. Ideally multiple breakpoints need to be activated to pinpoint a badly behaving application.
In contrast the Cortex-M3, which contains a sub-set of the new ARM Coresight debug technology, has 6 code breakpoints and 4 general-purpose watchpoints, providing enough breakpoints for most debugging scenarios (Figure 2 below .
The Cortex-M3 also allows live access to the core when the application is running, making it possible to read and write memory and set/clear breakpoints on a running application.
|Figure 2. Cortex-M3 debug architecture|
Debug functional units
There are five main functional units that implement the Cortex-M3 debug logic:
* DWT (Data Watchpoint and Trace) – provides a set of functions that collect information from the system buses and generates events to the ITM/ETM units. These functions are: four independent watchpoints; program counter sampler; interrupt trace; and CPU statistics.
* ETM (Embedded Trace Macrocell) – an optional unit that provides high bandwidth instruction trace data over a dedicated 4-bit high speed trace bus using a special hardware probe such as IAR J-Trace for Cortex-M3.
* FPB (Flash Patch and Breakpoint) – implements the logic for 6 code breakpoints, and contains logic to patch 8 instructions.
* ITM (Instrumentation Trace Macrocell) – the formatter for events originating from the DWT. ITM packets from 32 I/O ports can be picked up by the debugger in real time, and can transmit internal status and statistics on RTOS kernels, TCP/IP stacks and other middleware software
* DAP (Debug Access Port) – receives data from the other units and combines and routes the information to the available debug ports, JTAG, SWD (Serial Wire Debug), SWO (Serial Wire Output), and trace port.
SWD is the preferred debug interface when debugging with Cortex-M3, but to take full advantage of its features requires a probe with full SWD/SWO support such as IAR J-Link version 7 or later, which is capable of running at SWO speeds at up to 6Mb/s.
Using Cortex-M3 debug features
The Cortex-M3 debug controller enables debuggers such as C-SPY in IAR Embedded Workbench to provide enhanced functionality, simply by using a JTAG probe such as IAR J-Link that supports SWO data.
The function profiler, shown in Figure 3 below , will help find the functions where most time is spent during execution ” the parts to focus on when spending time and effort on optimizing code. In a system without trace capabilities, this would have required the debugger to set breakpoints at each entry and exit from functions.
|Figure 3. Function profiler window can be used to determine where most time is spend during execution (Courtesy of IAR Systems)|
DWT allows the debugger to sample the PC and provide statistical profiling. A PC sampling rate of around 10,000 samples per second is good enough to find CPU intensive functions in the application, although not to provide a precise profile. The timing information for each function in an application can then be displayed in different formats in IAR Embedded Workbench while the application is running.
The statistical trace information received via SWO can also be used to provide information about the number of times each instruction in the code has been executed.
For example in IAR Embedded Workbench, the instruction profiling information is displayed in the Disassembly window (Figure 4 below ) ” the leftmost column shows the number of times each instruction has been executed.
|Figure 4: Disassembly window shows instruction profiles which can be useful in deriving statistical trace information. (Courtesy of IAR Systems)|
The four watchpoints in the DWT module can be used to log accesses to up to four different memory locations or areas, including time information, and thus help to place that data in more efficient memory, making the application program more efficient and helping to debug it.
By using the DWT module to trigger an ITM packet for each interrupt activity, the debugger can present logs and graphs of the interrupt activity in the system, helping for example to locate which interrupts can be fine-tuned to make execution faster. Figure 5 below shows the Interrupt Log window in IAR Embedded Workbench. A condensed summary for each interrupt source is also available.
|Figure 5: Interrupt Log graphical window showing interrupt activity on a time scale for each interrupt source|
The ITM module also offers a non-intrusive printf() function that reduces the overhead for this mechanism to around 100s compared to a couple of hundred ms when using the traditional breakpoint-driven semi hosting method.
The increased number of breakpoints in the Cortex-M3 means data breakpoints ” useful for tracking down bugs that involve corrupted variables or data ” at the same time as code breakpoints are active.
The DAP in Cortex-M3 allows for full access to the core buses during application execution, enabling the debugger to allow live memory reads and writes, and to implement live watch on application variables. IAR Embedded Workbench uses this feature in the Live Watch window and the Memory window
Using the Cortex-M3 processor with reliable and reusable components such as commercial RTOSes, file systems and USB stacks makes it possible for the application developer to help minimise time to market. Using off-the-shelf software components literally saves many man years of software development allowing a project to be completed in a matter of a few days.
The Cortex-M3 processor has been specifically designed for cost sensitive embedded applications. The new features aim to make software on Cortex-M3s more efficient, and also make it easier to migrate from one controller to another, or to port an RTOS to a new platform.
The instruction set includes helpful new instructions, such as CLZ, that can improve assembly for common algorithms and facilitate a sleep mode when the processor is idle, it can enter, to be awoken when an interrupt occurs.
Finally, the debug controller makes developing and testing software easier. Not only is the architecture sensible, stable and efficient, its designers aimed provide a developer-friendly platform, with a sophisticated debug system and six flash breakpoints that are immensely helpful during testing and development, and powerful trace features that allow greater real-time visibility into application operation.
Jean Labrosse is President of Micrium, and a regular speaker at the Embedded Systems Conferences as well as serving on the Advisory Board of the conference. Jean is the author of two books: MicroC/OS-II, The Real-Time Kernel and, Embedded Systems Building Blocks, Complete and Ready-to-Use Modules in C and has written numerous articles for magazines. He has an MSEE and has been designing embedded systems for many years.
Anders Lundgren has been with IAR Systems since 1997. He currently works as product manager for the IAR Embedded Workbench for ARM. He received a M.S. in Computer Science from the University of Uppsala, Sweden in 1986.
Lotta Frimanson received a degree of Master of Science in Engineering Physics at Uppsala University Sweden in 1989. She has worked at IAR Systems as a product manager since 1999. Prior to this she has 10 years of experience from embedded systems programming.
 Advantages of the Cortex-M3, Brian Nagel, IQ Magazine, Volume 7, Number 4, 2008
 C/OS-II, The Real-Time Kernel, 2nd Edition, Jean J Labrosse, CMP Books, 2002