Basics of porting C-code to and between ARM CPUs: the Cortex-M1 and Cortex-M0 - Embedded.com

Basics of porting C-code to and between ARM CPUs: the Cortex-M1 and Cortex-M0

In Part 2 in a series, Joseph Yiu, author of “The definitive guide to the ARM Cortex-M0,” describes the differences between the Cortex-M1 and the Cortex-M0 and how to port your software code base between them.

Both the Cortex-M1 and the Cortex-M0 are based on the ARM architecture v6-M, so the differences between the Cortex-M1 and the Cortex-M0 are relatively small.

Instruction Set. In the Cortex-MI processor, WFI, WFE and SEV instructions are executed as NOPs. There is no sleep feature on current implementations of the Cortex-MI processor.

SVC instruction support is optional in the Cortex-Ml (based on the design configuration parameter defined by an FPGA designer), whereas in the Cortex-M0 processor, SVC instruction is always available.

NVIC. SVC and PendSV exceptions are optional in the Cortex-Ml processor. They are always present in the Cortex-M0. Interrupt latency are also different between the two processors. Some optimizations related to interrupt latency (e.g. zero fitter) are not available on the current implementations of Cortex-MI processor.

System-Level Features. The Cortex-M1 has Tightly Coupled Memory (TCM) support to allow memory blocks in the FPGA to connect to the Cortex-M1 directly for high-speed access, whereas the Cortex-M0 processor has various low-power support features like WIC (Wakeup Interrupt Controller).

There are also a number of differences in the configuration options between the two processors. These options are only available for FPGA designers (for Cortex-M1 users) or ASIC designers (for Cortex-M0 microcontroller vendors).

For example, with the Cortex-M1 processor you can include both the serial wire debug and the JTAG debug interface, whereas Cortex-M0 microcontrollers normally only support either the serial wire or the JTAG debug interface.

Porting between the Cortex-M0 and -M 1
In general, software porting between Cortex-M0 and Cortex-M 1 is extremely easy. Apart from peripheral programming model differences, there are few required changes.

Because both processors are based on the same instruction set, and the architecture version is the same, the same software code can often be used directly when porting from one processor to another. The only exception is when the software code uses sleep features. Because the Cortex-Ml does not support sleep mode, application code using WFI and WFE might need to be modified.

There is also a small chance that the software needs minor adjustment because of execution timing differences.

At the time of writing, no CMSIS software package is available for the Cortex-M1. However, you can use the same CMSIS files for the Cortex-M0 on Cortex-Ml programming, because they are based on the same version of the ARMv6-M architecture.

Differences between the Cortex-M3 and -M0
The Cortex-M3 processor is based on the ARMv7-M architecture. It supports many more 32bit Thumb instructions and a number of extra system features.

The performance of the CortexM3 is also higher than that for the Cortex-M0. These factors make the Cortex-M3 very attractive to demanding applications in the automotive and industrial control areas.

Programmer's Model. The ARMv7-M architecture is a superset of the ARMv6-M architecture. So it provides all the features available in the ARMv6-M. The Cortex-M3 processor also provides various additional features.

For the programmer's model, it has an extra nonprivileged mode (User Thread) when the processor is not executing exception handlers. The user Thread mode access to the processor configuration registers (e.g., NVIC, SysTick) is restricted, and an optional memory protection unit (MPU) can be used to block programs running in user threads from accessing certain memory regions (Figure 21.3 below).

Apart from the extra operation mode, the Cortex-M3 also has additional interrupt masking registers. The BASEPRI register allows interrupts to of certain priority level or lower to be blocked, and the FAULTMASK provides additional fault management features.

Figure 21.3:Programmer's model differences between the Cortex-M0/-M3.
The CONTROL register in the Cortex-M3 also has an additional bit (bit[0]) to select whether the thread should be in privileged or user Thread mode.

The xPSR in the Cortex-M3 also has a number of additional bits to allow an interrupted multiple load/store instruction to be resumed from the interrupted transfer and to allow an instruction sequence (up to four instructions) to be conditionally executed.

NVIC and Exceptions. The NVIC in the Cortex-M3 supports up to 240 interrupts. The number of priority levels is also configurable by the chip designers, from 8 levels to 256 levels (in most cases 8 levels to 32 levels).

The priority level settings can also be configured into preemption priority (for nested interrupt) and subpriority (used when multiple interrupts of the same preempt priority are happening at the same time) by software.

One of the major differences between the NVIC in the Cortex-M3 and Cortex-M0 is that most of the NVIC registers in the Cortex-M3 can be accessed using word, half word, or byte transfers. With the Cortex-M0, the NVIC must be accessed using a word transfer.

For example, if an interrupt priority register needs to be updated, you need to read the whole word (which consists of priority-level settings for four interrupts), modify 1 byte, and then write it back.

In the Cortex-M3, this can be carried out using just a single byte-size write to the priority-level register. For users of the CMSIS device driver library, this difference does not cause a software porting issue, as the CMSIS NVIC access function names are the same and the functions use the correct access method for the processor.

The NVIC in the Cortex-M3 also supports dynamic changing of priority levels-in contrast to the Cortex-M0, where the priority level of an interrupt should not be changed after it is enabled. The Cortex-M3 has additional fault handlers with programmable priority levels.

It allows the embedded systems to be protected by two levels of fault exception handlers (Figure 21.4 below ). When used together with the memory protection unit in the Cortex-M3, robust systems can be build for embedded systems that require high reliability.


Figure 21.4: Multiple levels of fault handling in the Cortex-M3 .
The NVIC in the Cortex-M3 also supports the following features:

Vector Table Offset Register. The vector table can be relocated to another address in the CODE memory region or the SRAM memory region.

Software Trigger Interrupt Register. Apart from using NVIC Interrupt Pending Set Register, the pending status of interrupts can be set using this register.

Interrupt Active Status Register. The active status of each interrupt can be determined by software.

There are also additional fault status registers for indicating causes of fault exceptions and fault addressand an additional exception called the debug monitor for debug purposes.

Instruction Set. In addition to the Thumb instructions supported in the Cortex-M0 processor, the Cortex-M3 also supports a number of additional 16-bit and 32-bit Thumb instructions. These include the following:

1- Signed and unsigned divide instructions (SDIV and UDIV)

2- Compare and branch if zero (CBZ), compare and branch if not zero (CBNZ)

3- IF-THEN (IT) instruction, allowing up to four subsequence instructions to be conditionally executed based on the status in APSR.

4- Multiply and accumulate instructions for 32-bit and 64-bit results.

5- Count leading zero (CLZ)

6- Bit field processing instructions for bit order reversing, bit field insert, bit field clear, and bit field extract

7- Table branch instructions (commonly used for the switch statement in C)

8- Saturation operation instructions

9- Exclusive accesses for multiprocessor environments

10- Additional instructions that allows high registers (R8 and above) to be used in data processing, memory accesses, and branches

These additional instructions allow faster processing of complex data like floating point values. They also allow the Cortex-M3 to be used in audio signal processing applications, real time control systems.

System-Level Features
The Cortex-M3 includes a number of system-level features that are not available on the CortexMO. These include the following:

• Memory protection unit (MPU). A memory access monitoring unit that provides eight memory regions. Each memory region can be defined with different locations and size, as well as different memory access permissions and access behavior. If an access violation is found, the access is blocked and a fault exception is triggered. The OS can use the MPU to ensure each task can only access permitted memory space to increase system reliability.

• Unaligned memory accesses . In the Cortex-M0, all the data transfer operations must be aligned. This means a word-size data transfer must have an address value divisible by 4, and half-word data transfer must occur at even addresses. The Cortex-M3 processor allows many memory access instructions to generate unaligned transfers. On the Cortex-M0 processor, access of unaligned data has to be carried out by multiple instructions.

• Bit band regions. The Cortex-M3 has two bit addressable memory regions called the bitband regions. The first bit-band region is in the first 1 MB of the SRAM region, and the second one is the first 1 MB of the peripheral region. Using another memory address range called bit-band alias, the bit data in the bit band region can be individually accessed and modified.

• Exclusive accesses. The Cortex-M3 supports exclusive accesses, which are used to handle shared data in multiprocessor systems such as semaphores. The processor bus interface supports additional signals for connecting to an exclusive access monitor unit on the bus system.

Debug Features
The Cortex-M3 provides additional breakpoints and data watchpoints in its debug system. The breakpoint unit can also be used to remap instruction or literal data accesses from the original address (e.g., mask ROM) to a different location in the SRAM region. This allows nonerasable program memories to be patched with a small programmable memory (Table 21.5 below ).

Table 21.5: Debug and Trace Feature Comparison
In addition to the standard debug features, the Cortex-M3 also has trace features. The optional Embedded Trace Macrocell (ETM) allows information about instruction execution to be captured so that the instruction execution sequence can be reconstructed on debugging hosts.

The Data Watch-point and Trace (DWT) unit can be used to generate trace for watched data variables or access to memory ranges. The DWT can also be used to generate event trace, which shows information of exception entrance and exit.

The trace data can be captured using a trace port analyzer such as the ARM RealView-Trace unit or an in-circuit debugger such as the Keil ULINKPro.

The Cortex-M3 processor also supports software-generated trace though a unit called the Instrumentation Trace Macrocell (ITM). The ITM provides 32 message channels and allows software to generate text messages or data output.

Porting between Cortex-M0 and Cortex-M3
Although there are a number of differences between the Cortex-M0 (ARMv6-M) and the Cortex-M3 (ARMv7-M), porting software between the two processors is usually easy.

Because the ARMv7-M supports all features in the ARMv6-M, applications developed for the Cortex-M0 can work on the Cortex-M3 directly, apart from changes that result from their peripheral differences (Figure 21.5 below ).

Figure 21.5: Compatibility between the Cortex-M0 and Cortex-M3 processors.
Normally, when porting an application from the Cortex-M0 to the Cortex-M3, you only need to change the device driver library, change the peripheral access code, and update the software for system features like clock speed, sleep modes, and the like.

Porting software from the Cortex-M3 to the Cortex-M0 might require more effort. Apart from switching the device driver library, you also need to consider the following areas:

• NVIC and SCB (System Control Block) registers in the Cortex-M0 can only be accessed in word-size transfers. If any program code accesses these registers in byte-size transfers or half-word transfers, they need to be modified. If the NVIC and SCB are accessed by using CMSIS functions, switching the CMSIS-compliant device driver to use the Cortex-M0 should automatically handle these differences.

• Some registers in the NVIC and the SCB in the Cortex-M3 are not available in the Cortex MO. These include the Interrupt Active Status Register, the Software Trigger Interrupt Register, the Vector Table Offset Register, and some of the fault status registers.

• The bit-band feature in the Cortex-M3 is not available in the Cortex-M0. If the bit-band alias access is used, it needs to be converted to use normal memory accesses and handle bit extract or bit modification by software.

• If the application contains assembly code or embedded assembly code, the assembly code might require modification because some of the instructions are not available on the Cortex-M0. For C application code, some instructions such as hardware divide are not available in the Cortex-M0. In this case, the compiler will automatically call the C library to handle the divide operation.

• Unaligned data transfer is not available on the Cortex-M0.

• Some instructions available in the Cortex-M3 (e.g., exclusive accesses, bit field processing) are not available on the Cortex-M0.

Some Cortex-M0 microcontrollers support a memory remapping feature. Applications that use the vector table relocation feature on the Cortex-M3 might able to use the memory remapping feature to handle vector table relocation.

Applications that require the user Thread mode or the MPU feature cannot be ported to the Cortex-M0 because these features are not supported in the Cortex-M0.

Porting between Cortex-M0 and Cortex-M4
The Cortex-M4 processor is based on the same architecture as that used for the Cortex-M3. It is similar to the Cortex-M3 in many aspects: it has the same Harvard bus architecture, approximately the same performance in terms of Dhrystone DMIPS/MHz, the same exception types, and so on.

Compared to the Cortex-M3, the Cortex-M4 has additional instructions such as single instruction, multiple data (SIMD) instructions, saturation arithmetic instructions, data packing and extraction instructions, and optional single precision floating point instructions if a floating point unit is implemented.

The floating point support in the Cortex-M4 is optional; therefore, not all Cortex-M4 microcontrollers will support this feature. If the floating point unit is included, it includes an additional floating point register bank and additional registers, as well as extra bit fields in the xPSR and CONTROL special registers (Figure 21.6 below ). The floating point unit can be turned on or off by software to reduce power consumption.

Figure 21.6: Programmer's model of the Cortex-M4 with a floating point.
Apart from these additional instructions, the system features of the Cortex-M4 are similar to those of the Cortex-M3 processor.

Therefore, the techniques for porting software between the Cortex-M0 and the Cortex-M3 processors can also be used on porting software between the Cortex-M0 and Cortex-M4 processors.

However, because of the differences between the nature of the two processors, some applications developed for the Cortex-M4 processor (e.g., highend audio processing or industrial applications that require floating point operations) are unsuitable for the Cortex-M0 processor.

Joseph Yiu, author of “The definitive guide to the ARM Cortex-M0, ” is a staff engineer at ARM Ltd., Cambridge, UK.

To read Part 1 , go to ARM 7TDMI and Cortex-M0.
Next in Part 3: From 8-/16-bit MCUs to the Cortex-M0

Used with permission from Newnes, a division of Elsevier.Copyright 2011, from “The definitive guide to the ARM Cortex-M0,“by Joseph Yiu. For more information about this title and other similar books,please visit www.elsevierdirect.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.