Achieving full MCU partition isolation: MPU management - Embedded.com

Achieving full MCU partition isolation: MPU management

This is the second article in a series on achieving high security for MCU-based systems. The first article covered security basics, the advantages of partitioning MCU software, parent/child tasks, utasks, and ptasks. In this article we will get into the details of MPU management. As stated in the first article, this is not a tutorial. The references at the end of the article are recommended for that.

Figure 1 shows the relationship between Task Control Blocks (TCBs), Memory Protection Arrays (MPAs), and MPA templates.


Figure 1: Template, MPA, and TCB Relationship. (Source: Author)

Note that an MPA template might apply to a single MPA and task or it might be shared between tasks and their MPAs. Normally, in the latter case, the tasks would be in the same partition. There is also a default MPA that does not require a template. It is used when no MPA has been created for a task. In the sections that follow we will discuss how to define templates.

Defining Regions

As discussed in the first article of this series, the first step in defining a partition is to define code, data, and IO regions that the partition needs plus certain standard regions, which are discussed below. The first step for this is to define sections.

Sections

In order to define sections, it is not necessary to reorganize modules that include code or data from other partitions. For this, pragmas can be used, as follows (Note: All code and directions are for the IAR EWARM tool suite):

#pragma default_function_attributes = @ ".t2a_text"
void ttCD18_t2a(void)
{
…
}
#pragma default_function_attributes =

Similarly, for data:

#pragma default_variable_attributes = @ ".t2a_data"
u32      irq;
TCB_PTR  parent;
u32      tskctr;
#pragma default_variable_attributes =

Same section pragma definitions that are scattered throughout many modules are merged by the linker into a single section. If modules get too messy with pragmas everywhere, then code and data for a partition can be grouped into same partition modules. Then, command line options can be used, instead of pragmas. For example, in the Options / C/C++ Compiler / Extra Options in the project file for a partition module, the following:

--section.bss=.t2a_bss
--section.data=.t2a_data
--section.text=.t2a_text
--section.rodata=.t2a_rodata
--section.noinit=.t2a_noinit

simply rename the section names assigned by the compiler. (Be sure to override Inherited settings.) If several modules for a partition are grouped into a project file node, then the above need be put only into the options for that node. Or, instead of the above, put the following into Extra Options:

-f $PROJ_DIR$\..\..\..\CFG\t2a.xcc

and put the above command lines into a new module, t2a.xcc. 

Linker Command File

Sections are used to define region blocks in the linker command file. For example, for v7M (Note: v7M is shorthand for ARMv7-M architecture and v8M is shorthand for ARMv8-M architecture):

define exported symbol t2acsz = 0x1000;
define exported symbol t2adsz = 0x100;
…
define block t2a_code with size = t2acsz*6/8, alignment = t2acsz {ro section.t2a_text, ro section.t2a_rodata};
define block t2a_data with size = t2adsz*5/8, alignment = t2adsz {rw section.t2a_bss, rw section.t2a_data};

t2a_code and t2a_data are region blocks, which become MPU regions. Note that the region sizes are powers of two. The actual region block sizes are the nearest larger of 5/8, 6/8, 7/8, or 1 times the region size. This utilizes subregion disables for the last 3, 2, 1, or 0 subregions, respectively. Note that the region blocks are aligned on the region sizes. The region blocks are placed in ROM and SRAM, respectively, at the end of the linker command file.

For v8M, things are simpler and more efficient:

define exported symbol t2acsz = 0xB00;
define exported symbol t2adsz = 0xA0;
…
define block t2a_code with size = t2acsz, alignment = 32 {ro section.t2a_text, ro section.t2a_rodata};
define block t2a_data with size = t2adsz, alignment = 32 {rw section.t2a_bss, rw section.t2a_data};

In this case, the sizes need only be multiples of 32 (0x20) and the region blocks need only be aligned on 32, as shown. Obviously, v8M is much more memory efficient than v7M. For example, t2a code is 0x1000 * 6/8 = 0xC00 = 3072 for v7M vs 0xB00 = 2816 for v8M. However, there are techniques to improve v7M memory efficiency. These will be presented in a future part of this series.

Partition Templates

A partition template consists of all regions needed by a partition. It is defined in the code as shown in the following example, for v7M:

#pragma section = "sys_code"
#pragma section = "sys_data"
#pragma section = "t2a_code"
#pragma section = "t2a_data"
MPA mpa_tmplt_t2a =
{
   RGN(0 | RA("sys_data")| V, DATARW | SRD("sys_data") | RSI("sys_data") | EN, "sys_data"),
   RGN(1 | RA("sys_code")| V, CODE   | SRD("sys_code") | RSI("sys_code") | EN, "sys_code"), 
   RGN(2 | RA("t2a_data")| V, DATARW | SRD("t2a_data") | RSIC(t2adsz)    | EN, "t2a_data"),
   RGN(3 | RA("t2a_code")| V, CODE   | SRD("t2a_code") | RSIC(t2acsz)    | EN, "t2a_code"),
   RGN(4 | 0x40011000    | V, IOR    | (0x9  << 1)  | EN, "USART1"),
   RGN(5 | V, 0, "spare"),  /* reserved for dynamic region */
   RGN(6 | V, 0, "spare"),  /* reserved for dynamic region */
   RGN(7 | V, 0, "stack"),  /* reserved for task stack */
};

In the above, the section names, such as “t2a_code” are defined and exported in the linker command file.  RGN, RA, SRD, RSI, and RSIC are macros that generate fields in the MPU registers RBAR and RASR. The third field (e.g. “t2a_data”) allows assigning a name to each region for use during debugging. It does not go into the MPU.

t2a is a ptask. Consequently, the standard regions, sys_code and sys_data, are necessary for it to obtain system services via direct function calls. sys_code contains the RTOS and other system service code, and sys_data contains the control blocks (e.g. TCBs) and the globals needed by the RTOS and other system services. Next we see t2a_data and t2a_code, which were defined above. Note that the IO region, USART1, is defined using constants. Regions 5 & 6 are spare regions that can be used as dynamic regions. Finally, region 7 is reserved for the task stack. Having two spare regions is unusual. In practice, partitions often need more than eight regions.

The corresponding partition template for v8M is:

{
   RGN(0, RA("sys_data") | DATARW, RLA("sys_data") | AI(0) | EN, "sys_data"),
   RGN(1, RA("sys_code") | CODE,   RLA("sys_code") | AI(0) | EN, "sys_code"), 
   RGN(2, RA("t2a_data") | DATARW, RLA("t2a_data") | AI(0) | EN, "t2a_data"),
   RGN(3, RA("t2a_code") | CODE,   RLA("t2a_code") | AI(0) | EN, "t2a_code"),
   RGN(4, 0x40011000     | IOR,    0x40011FFF      | AI(1) | EN, "USART1"),
   RGN(5, 0, 0, "spare"),  /* reserved for dynamic region */
   RGN(6, 0, 0, "spare"),  /* reserved for dynamic region */
   RGN(7, 0, 0, "stack"),  /* reserved for task stack */
};

Again simpler, this time due to the simpler v8M MPU register structure.

If t2a is converted to a utask, ut2a, then for v7M:

MPA mpa_tmplt_ut2a =
{
   RGN(0 | V, 0, "spare"),  /* reserved for dynamic region */
   RGN(1 | RA("svc_code") | V, CODE   | SRD("ucom_code") | RSI("ucom_code") | EN, "ucom_code"),
   RGN(2 | RA("ut2a_data") | V, DATARW | SRD("ut2a_data") | RSI("ut2a_data") | EN, "ut2a_data"),
   RGN(3 | RA("ut2a_code") | V, CODE   | SRD("ut2a_code") | RSI("ut2a_code") | EN, "ut2a_code"),
   RGN(4 | 0x40011000      | V, IOR    | (0x9  << 1)  | EN, "USART1"),
   RGN(5 | V, 0, "spare"),  /* reserved for dynamic region */
   RGN(6 | V, 0, "spare"),  /* reserved for dynamic region */
   RGN(7 | V, 0, "stack"),  /* reserved for task stack */
};

The only difference is that sys_data and sys_code have been replaced with a spare slot and svc_code. svc_code is a standard region that enables utasks to indirectly call system services via the SVC handler. The spare slot can be used for a dynamic region. Dynamic regions are discussed below. It should be noted that since the sys_data and sys_code privileged regions are not present, Background Region (BR) must be on in order to service interrupts and exceptions. (BR takes effect in pmode; it has no effect in umode.)

MPU/MPA Relationship

Figure 2 shows the relationship between the MPU and an MPA. The static slots are loaded one time, during system initialization. These are likely to contain privileged regions such as sys_code and sys_data, in order to allow ISRs and exception handlers to run without BR on. Static slots are quite likely in 16-slot MPUs, but not in 8-slot MPUs, due to the need for more active slots.


Figure 2: MPU/MPA Relationship. (Source: Author)

The active slots are loaded when a task is dispatched. The active region of each MPA must be exactly the same size, in this case 6 regions. Note that the SR region in slot 7 is the task stack region. This may be loaded into the MPA and MPU when a task is dispatched or loaded into the MPA when the MPA is created, then loaded into the MPU when the task is dispatched. Finally the auxiliary slots are only in the MPA, and their number varies from MPA to MPA. (MPAs are allocated from the main heap and thus can vary in size.)

Auxiliary Slots

Auxiliary slots are used to effectively increase the number of MPU slots. They are used in two ways: (1) for protected messages (pmsgs), which will be discussed in a future paper on partition portals, and for slot swapping. The latter is particularly useful for IO regions. For example, a USB host task might require access to the USB OTG, DMA, and USART controllers. For the STM32F746 processor, these are located at: 0x4004 0000 to 0x4007 FFFF, 0x4002 6000 to 63FF, and 0x4001 1000 to 13FF, respectively. The first requires a 0x4 0000 = 256K region starting on a 256K boundary, the second and third require 0x400 = 1K regions starting on 1K boundaries. These regions can be loaded into 3 auxiliary MPA slots and, when needed, a region can be swapped into the active IO slot in both the MPA and MPU. This is illustrated in Figure 3 for two auxiliary IO slots.


Figure 3: Swapping IO Regions. (Source: Author)

If only one active IO slot is available, as shown, the alternative is to define a region from 0x4000 0000 to 0x4007 FFFF – a whopping 0x8 000 = 512K region! It would have 0x1 0000 size subregions, so the first subregion (0x4000 0000 to 0x4000 FFFF) could be disabled. Thus the effective region would cover from 0x4001 0000 to 0x4007 FFFF. This includes about 20 other peripherals such as Ethernet, GPIO, SPI, Timers, and ADCs. The USB host task should not have access to any of these – this would be a field day for a hacker who had infected the USB host partition! The small amount of time required to swap IO regions is well worth the increase in partition isolation and thus security.

Dynamic Slots

A template can also contain dynamic slots. A dynamic slot has a means to indicate that it is not a static slot, and it has a pointer to where the region information is stored. When the MPA is created, that information is loaded into the dynamic MPA slot. The advantage of dynamic slots is that regions of the needed size can be created on-the-fly. For example, they can be allocated from a heap. This allows adjusting to different installation or operational requirements at run time. It is important to note that MPAs, and hence dynamic slots, can be created only in pmode and thus only by trusted software. A hacker who has penetrated a umode partition cannot create dynamic slots other than via system calls to create pblocks and pmsgs, which will be discussed in a future paper.

Stack Regions

Ref. 1 suggests an interesting idea of using red zones to protect against stack overflows. This has the advantage that it permits stacks to be right-sized for their tasks and to be located adjacent to each other in a single region that may contain other common variables. This can save a great deal of RAM in a v7M system. The red zone is a small region (e.g. 32 bytes) loaded into the top MPU slot on a task switch; it overlays the top of the task’s stack. The red zone prohibits all accesses so that a stack overflow (from below the red zone) will cause an MMF.

Whereas this might be useful to catch some stack usage errors, it is useless for security. A hacker can easily jump over the red zone by defining a large local array in his first malware function being currently run by the task due to some hacker trick. You must always assume that a hacker knows as much or more than you do about your own code. Hence he knows which stack goes with which task. Thus he will be able to place a false return to his second malware function in whatever stack he wishes. When that task runs, he will gain control of it via a return from its stack. In addition, the red zone uses an MPU slot, so it gains nothing over a stack region, which also uses an MPU slot, as far as MPU usage is concerned.

An ideal approach is to allocate a stack from a heap when the task is created. Of course, this requires a heap that can allocate a block of the right size and alignment for the MPU. In addition,  for v7M, it must set subregion disables to achieve the best greater-than-or-equal fit. An alternate approach is to use static stacks or stack pools in which the stacks are already properly sized and aligned. The heap approach offers more flexibility and efficiency, but either works fine for security. In both cases, the stack region has RW and XN (eXecute Never) attributes. XN defeats a number of hacker tricks. (The more the better!).

As a consequence of using a stack region, stack overflows and attempts to execute from the stack cause immediate MMFs. The worst a hacker can do is to wreck the stack and possibly the Task Local Storage (TLS) below it. He cannot damage any other stacks. For best security, the task stack region is put into the top slot. This is because for v7M, the attributes of the higher number slot prevails when regions overlap. This assures that the XN will not be overridden. v8M does not permit region overlap (more on this later), so this is not a factor for it, but the top slot is still used for consistency. 

Multi-task Partition Templates

In many cases, a partition will have only one task. However, as discussed in Parent/Child Tasks in the previous part of this series, it may be desirable have helper tasks in addition to the main task. This allows offloading some regions to helper or child tasks, thus staying within the MPU slot limit. The method for assigning partition regions to the MPAs of partition tasks is illustrated in Fig. 4.


Figure 4: Multiple MPAs from One Template. (Source: Author)

For brevity, this figure assumes a 4 slot MPU. When MPA1 is created, mask M1 selects active regions A, B, C, and D. When MPA2 is created, mask M2 selects active regions A, B, E, and F and auxiliary regions H and G. Thus the 4 slot MPU limit is met for the partition, even though the partition template has 6 active slots.

Conclusion

In the foregoing we have examined methods to create static and dynamic regions. In addition, we have covered creating partition templates and using them to initialize task MPAs. We also covered the MPU/MPA relationship and the definition of static, active, and auxiliary regions. Using auxiliary regions to create tight IO regions and task stack region tradeoffs have been examined, as well. This is a pretty complete review of the current state of MPU management.

In the next article in this series, we look into why multiple heaps are needed for high security. We will examine heap features that are useful in partitioned, embedded systems and look into the methods for allocating aligned blocks and region blocks from heaps.

References

  1. Jean Labrosse, “Using A Memory Protection Unit With An RTOS”, embeddedcomputing.com, May 2018.
  2. Ralph Moore, “Is Your Thing in Danger?”, http://www.smxrtos.com/articles/thingindanger.htm, March 2021.
  3. Ralph Moore, “Where’s The Gold?”, http://www.smxrtos.com/articles/wheresthegold.htm, April 2021.

Ralph Moore is a graduate of Caltech. He and a partner started Micro Digital Inc. in 1975 as one of the first microprocessor design services. In 1989 Ralph decided to get into the RTOS business and he architected the smx RTOS kernel. After 20 years of selling Micro Digital products and managing the business, he went back into product development. Currently he does the whole job from product definition, architecture, design, coding, debug, documentation, patenting, to promotion. Recent products include eheap, SecureSMX, and FRPort. Ralph has three children and six grandchildren and lives in Southern California..

Related Contents:

For more Embedded, subscribe to Embedded’s weekly email newsletter.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.