Achieving MPU security
Encryption, authentication, and other security methods work fine to protect data and program updates passing through the Internet. That is, unless one end can easily be hacked to steal secret keys and possibly implant malware for future activation. Then, unbeknownst to the system operators, confidential information is being stolen daily and possible major service disruptions lie ahead.
A large number of Cortex-M MCU-based products have been shipped since the Cortex-M architecture was introduced in 2005. Many of these products are connected to the Internet. Many new products are currently under development using Cortex-M MCUs, and due to the financial incentives of the IoT, an even a larger percentage of them will be connected to the Internet. In the vast majority of cases, these embedded devices have little or no protection against hacking.
Most Cortex-M MCUs, both in the field and in development, have Memory Protection Units (MPUs). However, because of a combination of tight schedules to deliver products and difficulty using the Cortex-M MPU, these MPUs are either under-used or not used at all. The apparent large waste of memory due to the MPU requirements that MPU regions be powers-of-two in size and that they be aligned on size boundaries has been an additional impediment for adoption by systems with limited memories.
Yet for these MCUs, the MPU and the SVC instruction are the only means of achieving acceptable security. Therefore, I set out a year and a half ago to determine if the problems with the MPU could be overcome and if it were possible to devise a practical way to upgrade post- and late-development projects, as well as new projects to use MPU security. I have found that it is practical to do this and MPU-Plus has been developed to ease this process.
All existing embedded systems use the Cortex-v7M architecture. The Cortex-v8M architecture, which was announced over a year ago, offers better security protection. Unfortunately, it is being adopted slowly by processor vendors and nearly all new MCUs still use the Cortex-v7M architecture. Hence, the latter will be with us for a long time to come. Consequently, this article presents a step-by-step process for porting existing systems to the Cortex-v7M MPU.
I have recently completed porting a substantial amount of middleware to unprivileged mode (umode) partitions utilizing the Cortex-M MPU. The code ported consists of:
USB mass storage class driver.
USB host stack.
Synopsys host controller driver.
All told, this amounts to about 20,000 lines of code – not a minor port.
The lessons I have learned from this port are:
It can be done by someone who is not the author of the code.
I have some familiarity with the above code, but I am not its author. Furthermore, it was created several years ago and is far from ideal for porting to an MPU. The importance of this port is that it demonstrates feasibility to improve the security of late- and post-project systems that use Cortex-M processors with MPUs.
With regard to late-project systems: While doing some pre-security planning is definitely a good idea, security really does add another dimension of complexity to development projects. Realistically such projects are hard-pressed to meet functional design goals on schedule. Adding security is likely to result in wasted design effort and missed schedules. I realize that this heretical thinking is worthy of banishment to a dark corner of cyberspace, but it is pragmatic. I think it is best to add security during manufacturing ramp-up – i.e. wait until there is something worth protecting.
With regard to post-project systems, I am sure there are many managers and engineers who are deeply concerned about the vulnerability of their systems in the field – especially if these systems are connected to the Hacker’s Highway (aka the Internet). There is hope here, too, as long as these products incorporate a Cortex-M MPU and their software can be updated.
Before discussing the step-by-step porting process to increase security, we need to recognize a few facts:
No security is perfect.
The ultimate goal is to be more secure than the potential payoff for a hacker.
Security improvement is an incremental process – it may take many passes and many years to achieve the desired level.
Improved security is primarily achieved by putting all vulnerable and untrusted code into umode and putting keys, security software (e.g. crypto, authentication, boot, and update code), and mission-critical code into privileged mode (pmode). The MPU prevents ucode from going outside of its assigned regions and restricts how the regions can be accessed (e.g. XN = execute never). Hacked ucode cannot subvert pcode, steal pdata, execute from data regions, overflow stacks, nor perform other hacker tricks -- it is boxed in.
Partitioning the code is the first step. Figure 1 shows what we might call a “first pass” partitioning – i.e. not too ambitious. In this case, there are just 5 partitions: application and middleware which run in unprivileged mode (umode) and initialization, system services, and mission-critical code, which run in privileged mode (pmode). Each partition includes one or more tasks.
Figure 1: Partitions (Source Micro Digital)
Initialization code runs before normal task operation begins. It typically runs with the MPU off or with it on and Background Region (BR) enabled. Hence, this code can access anything and do anything. This is ok because it part of secure boot, which theoretically cannot be hacked. Secure boot is outside of the scope of this paper.
System services include exception handlers, the RTOS, and security software with its secret keys. Mission-critical software is typically a small amount of software that does the main job. These are trusted software. (Of course, mission-critical code and keys are the babies we are trying to protect.) pmode code is protected from umode code by MPU regions backed up by the privileged processor state and a SoftWare Interrupt (SWI) API to system services that is implemented with the SVC instruction. The heavy line In Figure 1 represents the barrier between umode and pmode.
Malware that penetrates either of the umode partitions cannot gain access to the pmode partitions if effective hardware-enforced separation has been achieved.
The MPU has 8 slots, each of which may contain a region. Figure 2 shows typical MPU structures for pmode and umode.
click for larger image
Figure 2: MPU Structures (Source Micro Digital)
Region 7 is reserved for the task stack region, which is managed by the scheduler. This region allows detecting task stack overflows immediately and prevents code execution from the stack. The two sys regions in pmode allow turning BR off in order to isolate ptasks. This is necessary for the step-by-step conversion process from ptasks to utasks, as presented below.
The other regions are loaded from each task’s Memory Protection Array (MPA), discussed next. The task_code and task_data regions are specific to a task or to its partition. The pcom_code and pcom_data regions are common between two or more tasks or regions, while in pmode, and ucom_code and ucom_data are common between two or more tasks or regions, while in umode. This is explained more fully in the step-by-step process discussion, below.
The Command and Status Register (CSR) regions are I/O regions. The syn_csr and ur1_csr (Synopsys USB and UART1) in umode are contained in the apb0 csr in pmode due to its insufficient MPU slots. Since the umode MPU has more slots, it is possible to offer better security in umode.