The ARM Cortex-M3 architecture provides many improvements compared withits predecessor, the popular ARM7/9, and is designed to be particularlysuitable for cost-sensitive embedded applications that requiredeterministic system behavior.
This article describes how developers can best utilize the advancedcapabilities of the Cortex-M3 when designing embedded applications.
Comparing ARM7/9 to Cortex-M3
Cortex-M3 is a member of the Cortex-M family, one of the three ARMCortex architectures that were introduced to the embedded marketplacein 2004, and is being integrated into low-cost embeddedmicrocontrollers (MCUs) from an increasing number of silicon vendors.
A comparison of the main characteristics of Cortex-M3 with those ofARM7/9 is shown in Table 1 below .
|Table1: Comparison of ARM7/9 and Cortex-M3 characteristics|
The Cortex-M3 improves on the ARM7/9 in most qualitative estimates “simpler stack architecture, better interrupt controller, andhigher-performance instruction set, as well as enhanced debugcapabilities, all of which can significantly affect end-productperformance.
Stacking and Interrupts
The Cortex-M3 reduces both the overhead and complexity of ARM7/9 stackmanagement by incorporating only two stacks(Figure 1, below ).Tasks execute in Thread mode, using the process stack, while interruptsexecute in Handler mode, using the main stack.
The task context is automatically saved on the process stack when anexception occurs, upon which the processor moves to Handler mode,making the main stack active. On return from the exception, the taskcontext is restored and Thread mode re-instated if the interrupted taskremains the active task.
If, however, a new task is to be scheduled, the context switch musttake place. Because the task context is already saved, this procedureis more straight- forward with the Cortex-M3 and also consumes 50%fewer processor cycles.
|Figure1: Cortex-M3 task switching|
Migration between processors
The Cortex-M3 includes several integrated peripherals in addition tothe core CPU. Most important of these is the Nested Vectored InterruptController (NVIC), designed for low latency, efficiency andconfigurability.
The NVIC saves half the processor registers automatically uponinterrupt, restoring them upon exit, allowing for efficient interrupthandling. It also removes the need for saving/restoring registersduring back-to-back interrupts. The NVIC also integrates the SysTick, a24-bit down-counting timer intended for RTOS use.
The NVIC and SysTick peripherals ease the migration betweenCortex-M3 processors, particularly when an RTOS is used, as it simplyrequires a function that returns the clock frequency on which theSysTick timer is based.
In contrast, an RTOS port to an ARM7TDMI-S processor would require aport to the interrupt controller of the processor and a port to ahardware timer in addition to the generic ARM port. The interruptfunctionality, which must be written individually for each ARM7/9 port,is provided just once for all Cortex-M3 implementations.
The sleep mode feature of the Cortex-M3 can be used to conservepower when the target application is idle. For example, withµC/OS-II the idle task calls an application-level hook thatcauses the processor to enter sleep mode until the next interrupt isreceived. Unlike most previous ARM processors, the Cortex-M3 also has afixed memory map.
The Cortex-M3 implements ARMv7-M, using the Thumb-2 Instruction SetArchitecture (ISA) ” a superset of the original Thumb ” which includesnew 16- and 32-bit instructions. Cortex-M3s always execute in a singlemode (Thumb-2), unlike ARM7/9s that needed to switch between ARM/Thumbmodes.
The Cortex-M3 includes 36 instructions not available on the ARM7/9,including CLZ (count leading zeros), which is particularly useful forkernel scheduling algorithms. An optimized version of the schedulingalgorithm for µC/OS-II written in assembly language using the CLZand RBIT instructions can be used to find the highest priority readytask efficiently within about 25 clock cycles ” about twice as fast asthe equivalent optimization that can be done with an ARM7/9 and, 3-4times faster than the same algorithm written in C.
Debug and trace
The ARM7TDMI-S cores have only two hardware watchpoints ” translatingto either two code breakpoints or one code breakpoint and a databreakpoint ” and no live core access. Ideally multiple breakpoints needto be activated to pinpoint a badly behaving application.
In contrast the Cortex-M3, which contains a sub-set of the new ARMCoresight debug technology has 6 code breakpoints and 4 general-purposewatchpoints, providing enough breakpoints for most debugging scenarios.The Cortex-M3 also allows live access to the core when the applicationis running, making it possible to read and write memory and set/clearbreakpoints on a running application.
Debug functional units
There are five main functional units that implement the Cortex-M3 debuglogic (Figure 2, below ):
* DWT (Data Watchpoint and Trace) “provides a set of functions that collect information from the systembuses and generates events to the ITM/ETM units. These functions are:four independent watchpoints; program counter sampler; interrupt trace;and CPU statistics.
* ETM (Embedded Trace Macrocell) ” anoptional unit that provides high bandwidth instruction trace data overa dedicated 4-bit high speed trace bus using a special hardware probesuch as IAR J-Trace for Cortex-M3.
* FPB (Flash Patch and Breakpoint) “implements the logic for 6 code breakpoints, and contains logic topatch 8 instructions.
* ITM (Instrumentation Trace Macrocell) “the formatter for events originating from the DWT. ITM packets from 32I/O ports can be picked up by the debugger in real time, and cantransmit internal status and statistics on RTOS kernels, TCP/IP stacksand other middleware software
* DAP (Debug Access Port) ” receives datafrom the other units and combines and routes the information to theavailable debug ports, JTAG, SWD (Serial Wire Debug), SWO (Serial WireOutput), and trace port.
SWD is the preferred debug interface when debugging with Cortex-M3,but to take full advantage of its features requires a probe with fullSWD/SWO support such as IAR J-Link version 7 or later, which is capableof running at SWO speeds at up to 6Mb/s.
|Figure2: Cortex-M3 debug architecture|
Using Cortex-M3 debug features
The Cortex-M3 debug controller enables debuggers such as C-SPY in IAREmbedded Workbench to provide enhanced functionality, simply by using aJTAG probe such as IAR J-Link that supports SWO data.
The function profiler, shown in Figure 3 below, will help find thefunctions where most time is spent during execution ” the parts tofocus on when spending time and effort on optimizing code. In a systemwithout trace capabilities, this would have required the debugger toset breakpoints at each entry and exit from functions.
|Figure3: Function profiler window in IAR Embedded Workbench|
DWT allows the debugger to sample the PC and provide statisticalprofiling. A PC sampling rate of around 10,000 samples per second isgood enough to find CPU intensive functions in the application,although not to provide a precise profile.
The timing information for each function in an application can thenbe displayed in different formats in IAR Embedded Workbench while theapplication is running.
The statistical trace information received via SWO can also be usedto provide information about the number of times each instruction inthe code has been executed.
For example, in IAR Embedded Workbench, the instruction profilinginformation is displayed in the Disassembly window (Figure 4 below )” the leftmost column shows the number of times each instruction hasbeen executed.
|Figure4: Disassembly window in IAR Embedded Workbench showing instructionprofiling|
The four watch points in the DWT module can be used to log accessesto up to four different memory locations or areas, including timeinformation, and thus help to place that data in more efficient memory,making the application program more efficient and helping to debug it.
By using the DWT module to trigger an ITM packet for each interruptactivity, the debugger can present logs and graphs of the interruptactivity in the system, helping for example to locate which interruptscan be fine-tuned to make execution faster. Figure 5 below shows the Interrupt Log window in IAR Embedded Workbench. A condensedsummary for each interrupt source is also available.
|Figure5: Interrupt Log graphical window showing interrupt activity on a timescale for each interrupt source|
The ITM module also offers a non-intrusive printf() function thatreduces the overhead for this mechanism to around 100 microsecondscompared to a couple of hundred ms when using the traditionalbreakpoint-driven semi hosting method.
The increased number of breakpoints in the Cortex-M3 means databreakpoints ” useful for tracking down bugs that involve corruptedvariables or data ” at the same time as code breakpoints are active.
The DAP in Cortex-M3 allows for full access to the core buses duringapplication execution, enabling the debugger to allow live memory readsand writes, and to implement live watch on application variables. IAREmbedded Workbench uses this feature in the Live Watch window and theMemory window
Using the Cortex-M3 processor with reliable and reusable componentssuch as commercial RTOSes, file systems and USB stacks makes itpossible for the application developer to help minimize time to market.
Using off-the-shelf software components literally saves many manyears of software development allowing a project to be completed in amatter of a few days.
The Cortex-M3 processor has been specifically designed for costsensitive embedded applications. The new features aim to make softwareon Cortex-M3s more efficient, and also make it easier to migrate fromone controller to another, or to port an RTOS to a new platform.
The instruction set includes helpful new instructions, such as CLZ,that can improve assembly for common algorithms and facilitate a sleepmode when the processor is idle, it can enter, to be awoken when aninterrupt occurs.
Finally, the debug controller makes developing and testing softwareeasier. Not only is the architecture sensible, stable and efficient,its designers aimed provide a developer-friendly platform, with asophisticated debug system and six flash breakpoints that are immenselyhelpful during testing and development, and powerful trace featuresthat allow greater real-time visibility into application operation.
Jean Labrosse is President of Micrium, a provider of highquality embedded software solutions. Mr. Labrosse is a regular speakerat the Embedded Systems Conferences and serves on the Advisory Board ofthe conference. Jean is the author of two books: MicroC/OS-II, TheReal-Time Kernel and, Embedded Systems Building Blocks, Complete andReady-to-Use Modules in C and has written numerous articles formagazines. He has an MSEE and has been designing embedded systems formany years.
Anders Lundgren has been with IAR Systemssince 1997. He currently works as product manager for the IAR EmbeddedWorkbench for ARM. During the first years with IAR Systems he workedwith compiler development and as project manager for compiler anddebugger projects. Prior to joining IAR Systems Mr. Lundgren workedwith space science instruments at the European Space Agency and spentone year at the space science laboratory at the University ofCalifornia, Berkeley. He received a M.S. in Computer Science from theUniversity of Uppsala, Sweden in 1986.
Lotta Frimanson received a degree of Master of Science inEngineering Physics at Uppsala University Sweden in 1989. She hasworked at IAR Systems as a product manager since 1999. Prior to thisshe has 10 years of experience from embedded systems programming.
 Advantages of the Cortex-M3, Brian Nagel, IQ Magazine,Volume 7, Number 4, 2008
 µC/OS-II, The Real-Time Kernel, 2nd Edition, Jean JLabrosse, CMP Books, 2002