A step-by-step process for achieving MPU security
Solving MMFs may consist, in many cases, of just moving taskA-specific code and data into taskA_code and taskA_data regions, respectively. Assigning regions to tasks varies from task to task – even within the same partition. Some tasks may not need all of the code and data regions, but may need other regions, such as I/O regions. Standard C library calls are also likely to show up. Their libraries can be included in pcom_code in the linker command file.
SB_MPA_SIZE can be increased up to 5 for pmode. If this is not enough, it may be necessary to merge one or more regions into larger regions. For example, in mpa_tmplt_usbh for pmode (see Figure 2), the UART1 and Synopsys regions in have been merged into a region called apb0_csr. This region encompasses all APB0 peripherals, which is undesirable. However, this move will be reversed in the next step.
The final steps are to set SMX_CFG_UMODE to 1 and to move taskA into umode. This is done by setting its umode flag, as follows:
#if SMX_CFG_UMODE smx_TaskSet(taskA, SMX_ST_UMODE, 1); #endif
This normally follows the smx_TaskSet() shown in step 4. Now, when taskA is dispatched, the sb_PendSV_Handler() will set the processor CONTROL register = 3 just ahead of starting the task. This causes the processor to run in unprivileged thread mode using the task’s stack. When a task is resumed, the EXE_RETURN value (0xFFFFFFED) also causes this.
In addition, add:
#if SMX_CFG_UMODE #include "xapiu.h" #endif
ahead of taskA code that calls system services in umode. This forces the SWI API to be used by taskA for these calls. taskA functions that run only in pmode (e.g. initialization) should be moved to the tops of taskA modules or put into their own module(s). In the former case, #include "xapiu.h" is placed after them and it is effective to the end of the module; in the latter case it should not be placed in the module at all. #include "xapiu.h” also is not needed ahead of functions that do not make any calls listed in it.
If a function is missed that makes a listed call, the call will cause a MMF when the function runs in umode, so this is easy to find and fix.
Before actually running taskA, mpu_tmplt_taskA must be changed. taskA_code and taskA_data regions should stay the same. However, pcom_code and pcom_data must be replaced with ucom_code and ucom_data. At the very least smx and system service call functions must be replaced with the smx and system service shells. However, there is likely to be much more than this to do.
As shown in mpa_tmplt_usbh, in Figure 2, the syn_csr and ur1_csr regions have been split apart. This significantly improves security by preventing the usbh and fs tasks from accessing all of the other peripherals on the APB0 bus, as was possible in step 4. Also, SB_MPA_SIZE, must be increased to 6. The BR flag in MPU_CTRL is now automatically set to 1 for utasks in order to handle interrupts and exceptions, and sys_code and sys_data are no longer needed by utasks, but they are still needed by isolated ptasks. MPA can be used for an additional region, if needed, by increasing SB_MPA_SIZE to 7. Thus, there are up to 7 regions available for utasks and only up to 5 regions for isolated ptasks. The 8th region, MPU, is reserved for task stacks and is controlled by the task scheduler.
When taskA first starts running as a utask, you are likely to see both MMFs and PRIVILEGE VIOLATION errors. The latter indicate that restricted service calls are being made by taskA. This may necessitate recoding to not use those services in taskA, but to call them from a ptask, instead. Or it may work better to split taskA into a ptask, which calls these services and a utask, which does not. Alternatively, taskA could start as a ptask, make all of the restricted service calls, then restart itself as a utask. (It must restart itself so that the PendSV_Handler() will change CONTROL to 0x3.)
Restricting system services for utasks is important because if malware in a utask can make calls such as deleting another task or shutting down the system, security obviously is not good. However, it may be necessary to permit these weaknesses in the first pass in order to avoid excessive recoding. It may help that different versions of xapiu.h can be applied to different partitions. Thus, relatively good code might be allowed more liberties than not so good code.
If converting a ptask to a utask results in it sharing code or data with another ptask, there is likely to be a problem. Potential solutions are:
Putting all tasks into one partition and converting the whole partition, at once, instead of task by task. This is ok if all of the ptasks are destined to be utasks.
Defining a common region accessible by both tasks. This might be acceptable for common code such as C libraries, but is not acceptable for pdata.
Passing global data via messages or pipes.
Splitting taskA into a ptask and a utask, where the ptask shares pcode and pdata with other ptasks and the utask does not.
Starting a task as a ptask to perform pfunctions required for task initialization, then switching it to a utask that performs only ufunctions.
Replicating common routines using different names.
Restructuring tasks often involves minimal code change – the complex code often resides in subroutines called from tasks and they may not be impacted.
Size and Performance Impact
MPU-Plus adds about 2000 bytes of code to an application. It adds 8*SB_MPA_SIZE * NUM_TASKS bytes of RAM and 8*SB_MPA_SIZE bytes of ROM per MPA template to the application code. For the example port, wasted code space was only 14.5% and wasted data space was an even smaller 2.5%. Since this is a large amount of code that was not tailored to the MPU, these are probably typical numbers. If the wasted code space is still too much, using smaller regions will help (the data space has smaller regions than the code space). There are also other techniques that can be applied.
Since I/O regions are directly accessible in umode and since code runs as fast in umode as in pmode, performance reduction is primarily due to overhead on system services and task switching. Execution overhead for typical system services, such as smx_SemTest() or smx_SemSignal() is about 100%. Percent overhead is less for longer system services, since the overhead is a fixed amount of time per system service. Overhead for task starting is about 25%, for loading the task’s MPA into the MPU, and for other MPU conditional code. It is about half of this for task resumption and negligible for task suspension or stopping.
The foregoing demonstrates that there is a feasible step-by-step process to incrementally improve the security of Cortex-M embedded systems that have MPUs and that this process can be performed on systems already released and by people who did not write the code. Although substantial work may be required, a clear path has been defined to achieve security objectives in a cost-effective manner. This is better than doing nothing and should be considered before disaster strikes.
 Software of Unknown Pedigree.
 MPU[N] is used herein to mean MPU slot N.
 An ISR shell is a C or assembly wrapper that calls the body of an ISR. These are typically not needed for Cortex‑M processors, but they may be used by the RTOS for generality, and they are useful here to minimize the amount of code to put into .sys_code.
 Also called the Main Stack (MS) by ARM Ltd.
Ralph Moore graduated with a degree in Physics from Caltech. He spent his early career in computer research. Then he moved into mainframe design in the 60's and became a consultant in the early 70's. He taught himself programming and became a microprocessor programmer. He founded Micro Digital in 1975, and many years of successful consulting lead into architecting and developing the smx kernel in 1987. For many years he managed the business and sales, but in recent years he has been focused almost solely on development of the smx multitasking kernel v4. He can be reached by email at firstname.lastname@example.org.