Achieving MPU security

Introduction

Encryption, authentication, and other security methods work fine to protect data and program updates passing through the Internet. That is, unless one end can easily be hacked to steal secret keys and possibly implant malware for future activation. Then, unbeknownst to the system operators, confidential information is being stolen daily and possible major service disruptions lie ahead.

A large number of Cortex-M MCU-based products have been shipped since the Cortex-M architecture was introduced in 2005. Many of these products are connected to the Internet. Many new products are currently under development using Cortex-M MCUs, and due to the financial incentives of the IoT, an even a larger percentage of them will be connected to the Internet. In the vast majority of cases, these embedded devices have little or no protection against hacking.

Most Cortex-M MCUs, both in the field and in development, have Memory Protection Units (MPUs). However, because of a combination of tight schedules to deliver products and difficulty using the Cortex-M MPU, these MPUs are either under-used or not used at all. The apparent large waste of memory due to the MPU requirements that MPU regions be powers-of-two in size and that they be aligned on size boundaries has been an additional impediment for adoption by systems with limited memories.

Yet for these MCUs, the MPU and the SVC instruction are the only means of achieving acceptable security. Therefore, I set out a year and a half ago to determine if the problems with the MPU could be overcome and if it were possible to devise a practical way to upgrade post- and late-development projects, as well as new projects to use MPU security. I have found that it is practical to do this and MPU-Plus has been developed to ease this process.

All existing embedded systems use the Cortex-v7M architecture. The Cortex-v8M architecture, which was announced over a year ago, offers better security protection. Unfortunately, it is being adopted slowly by processor vendors and nearly all new MCUs still use the Cortex-v7M architecture. Hence, the latter will be with us for a long time to come. Consequently, this article presents a step-by-step process for porting existing systems to the Cortex-v7M MPU.

Port

I have recently completed porting a substantial amount of middleware to unprivileged mode (umode) partitions utilizing the Cortex-M MPU. The code ported consists of:

  • File demo.

  • File system.

  • USB mass storage class driver.

  • USB host stack.

  • Synopsys host controller driver.

All told, this amounts to about 20,000 lines of code – not a minor port.

The lessons I have learned from this port are:

  • It works!

  • It can be done by someone who is not the author of the code.

I have some familiarity with the above code, but I am not its author. Furthermore, it was created several years ago and is far from ideal for porting to an MPU. The importance of this port is that it demonstrates feasibility to improve the security of late- and post-project systems that use Cortex-M processors with MPUs.

With regard to late-project systems: While doing some pre-security planning is definitely a good idea, security really does add another dimension of complexity to development projects. Realistically such projects are hard-pressed to meet functional design goals on schedule. Adding security is likely to result in wasted design effort and missed schedules. I realize that this heretical thinking is worthy of banishment to a dark corner of cyberspace, but it is pragmatic. I think it is best to add security during manufacturing ramp-up – i.e. wait until there is something worth protecting.

With regard to post-project systems, I am sure there are many managers and engineers who are deeply concerned about the vulnerability of their systems in the field – especially if these systems are connected to the Hacker’s Highway (aka the Internet). There is hope here, too, as long as these products incorporate a Cortex-M MPU and their software can be updated.

Security Facts

Before discussing the step-by-step porting process to increase security, we need to recognize a few facts:

  • No security is perfect.

  • The ultimate goal is to be more secure than the potential payoff for a hacker.

  • Security improvement is an incremental process – it may take many passes and many years to achieve the desired level.

Improved security is primarily achieved by putting all vulnerable and untrusted code into umode and putting keys, security software (e.g. crypto, authentication, boot, and update code), and mission-critical code into privileged mode (pmode). The MPU prevents ucode from going outside of its assigned regions and restricts how the regions can be accessed (e.g. XN = execute never). Hacked ucode cannot subvert pcode, steal pdata, execute from data regions, overflow stacks, nor perform other hacker tricks — it is boxed in.

Partitions

Partitioning the code is the first step. Figure 1 shows what we might call a “first pass” partitioning – i.e. not too ambitious. In this case, there are just 5 partitions: application and middleware which run in unprivileged mode (umode) and initialization, system services, and mission-critical code, which run in privileged mode (pmode). Each partition includes one or more tasks.


Figure 1: Partitions (Source Micro Digital)

Initialization code runs before normal task operation begins. It typically runs with the MPU off or with it on and Background Region (BR) enabled. Hence, this code can access anything and do anything. This is ok because it part of secure boot , which theoretically cannot be hacked. Secure boot is outside of the scope of this paper.

System services include exception handlers, the RTOS, and security software with its secret keys. Mission-critical software is typically a small amount of software that does the main job. These are trusted software. (Of course, mission-critical code and keys are the babies we are trying to protect.) pmode code is protected from umode code by MPU regions backed up by the privileged processor state and a SoftWare Interrupt (SWI) API to system services that is implemented with the SVC instruction. The heavy line In Figure 1 represents the barrier between umode and pmode.

Malware that penetrates either of the umode partitions cannot gain access to the pmode partitions if effective hardware-enforced separation has been achieved.

MPU

The MPU has 8 slots, each of which may contain a region. Figure 2 shows typical MPU structures for pmode and umode.

click for larger image

Figure 2: MPU Structures (Source Micro Digital)

Region 7 is reserved for the task stack region, which is managed by the scheduler. This region allows detecting task stack overflows immediately and prevents code execution from the stack. The two sys regions in pmode allow turning BR off in order to isolate ptasks. This is necessary for the step-by-step conversion process from ptasks to utasks, as presented below.

The other regions are loaded from each task’s Memory Protection Array (MPA), discussed next. The task_code and task_data regions are specific to a task or to its partition. The pcom_code and pcom_data regions are common between two or more tasks or regions, while in pmode, and ucom_code and ucom_data are common between two or more tasks or regions, while in umode. This is explained more fully in the step-by-step process discussion, below.

The Command and Status Register (CSR) regions are I/O regions. The syn_csr and ur1_csr (Synopsys USB and UART1) in umode are contained in the apb0 csr in pmode due to its insufficient MPU slots. Since the umode MPU has more slots, it is possible to offer better security in umode.

MPAs

click for larger image

Figure 3: MPA Per Task (Source Micro Digital)

As shown in Figure 3, each task has its own Memory Protection Array (MPA). The MPAs are in the same order as the TCBs in the task control table. Each MPA is a replica of the dynamic portion of the MPU when its associated task is running. An MPA is loaded into the MPU by the scheduler when its task is dispatched.

Also, as shown in Figure 3, MPA templates determine the contents of MPAs. A template may be shared among MPAs. This would tend to be the case for tasks in the same partition, although they need not have the same templates and may just share a few regions.

Each MPA is an array of structures consisting of two 32-bit fields named rbar and rasr . These are exact copies of the MPU RBAR and RASR registers in each MPU slot, except that the valid bit is set in rbar, but not in RBAR. The following is an example of a template:

const MPA mpa_tmplt_usbh ={   {RA("ucom_data") | V | 0, RW_DATA  | N57 | RSIC(udsz)   | EN},   {RA("ucom_code") | V | 1, UCODE    | N7  | RSIC(ucsz)   | EN},   {RA("usbh_data") | V | 2, RW_DATA        | RSIC(usbdsz) | EN},   {RA("usbh_code") | V | 3, UCODE    | N67 | RSIC(usbcsz) | EN},   {RA("syn_csr")   | V | 4, RW_DATA        | RSIC(synsz)  | EN},   {RA("ur1_csr")   | V | 5, RW_DATA        | RSIC(ur1sz)  | EN},};

This corresponds to the umode MPU structure shown in Figure 2. There is a similar template for pmode.

Creating Regions

Templates consist of regions. So how do we create regions? One approach is to start by defining sections in the C source code modules of the application. For example, in each C module containing code for a specific region, start the code[1] with:

#pragma default_function_attributes = @ “.ucom_code”// Place all ucom functions here.#pragma default_function_attributes =

“ucom_code” is a section name to identify a section containing common code between utasks. For code that is specific to taskA, use a section name such as “.taskA_code”. The “.” ahead of the section name is chosen for consistency with standard compiler section names, such as .text, .bss, etc. Any number of functions can be enclosed between the pragmas. Also, the above structure can be repeated in other modules and all of the functions will be combined into the single .ucom_code section by the linker.

For data use:

#pragma default_variable_attributes = @ “.ucom_data”// Place all ucom data here.#pragma default_variable_attributes =

As with code, any number of variables can be enclosed between the pragmas, and the above structure can be repeated in other modules to create a single .ucom_data section containing all of the ucom variables. This, of course, can also be done for a section containing data for a single task, such as .taskA_data.

In the ILINK linker command file (.icf), for the .ucom_code section use the following:

define exported symbol ucsz = 0x8000;…define block ucom_code with size = 0x7000, alignment = ucsz {ro section .ucom_code};…place in ROM   {block ucom_code, …};

(0x8000 vs. 0x7000 is explained in the next section.) If desired, additional sections can be placed in the ucom_code block and they can be placed in a fixed order, if desired. In this example, only the .ucom_code section is placed in the ucom_code block. (Note that the section and block names differ by a “.”.)

In the C file above where the template is being defined place:

#pragma section="ucom_code"extern u32 ucsz;

Now, ucom_code and ucsz can be used in the MPA template definition, as shown in the previous section. Thus, linker blocks become MPA regions. This is convenient because their size, alignment, order, and other characteristics can be easily controlled.

Linker blocks

As noted, blocks in the linker command file (.icf) become regions in MPA templates and ultimately MPU regions. Using linker blocks mitigates against assigning wrong sizes and/or alignments. If a block is too small for the section(s) it contains, the linker will complain. Conversely, the linker map file will show if the linker block is too large.

To assign size and alignment to a linker block in the .icf, we distinguish between three sizes: region size , nominal size , and actual size . Suppose, for example that the actual size of ucom_code is 0x6B16. Then the region size must be the next larger power of two = 0x8000[2]. Divide the region size by 8 to get the subregion size = 0x1000. Now find N such that it is the largest value for which 0x8000 – N*0x1000 >= 0x6B16. In this case N = 1, thus the nominal size = 0x7000. So ucsz = 0x8000 and the nominal block size = 0x7000.

The last step is to place N7 (subregion 7 off) in the rasr field of the ucom_code region in the MPA template, as shown in mpa_tmplt_usbh above. If we forget this, the usbh task will be able to access whatever the linker puts in the 0x1000 bytes after the ucom_code region. On the other hand, if we put N67 by mistake (subregions 6 and 7 off) we will get Memory Management Faults (MMFs) when the code attempts accesses in subregion 6.

At last we are ready to start conversion!

Part two of this two-part series will discuss the details of this conversion process.

NOTES:

[1] All code shown in this paper is based upon the IAR EWARM® tool suite and the Micro Digital SMX® RTOS with MPU‑Plus®.

[2] It is much easier and less error-prone to work with hexadecimal numbers. It also helps to have a hexadecimal calculator handy.


Ralph Moore graduated with a degree in Physics from Caltech. He spent his early career in computer research. Then he moved into mainframe design in the 60's and became a consultant in the early 70's. He taught himself programming and became a microprocessor programmer. He founded Micro Digital in 1975, and many years of successful consulting lead into architecting and developing the smx kernel in 1987. For many years he managed the business and sales, but in recent years he has been focused almost solely on development of the smx multitasking kernel v4. He can be reached by email at .

1 thought on “Achieving MPU security

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.