To me, achieving full partition isolation is the Holy Grail of microcontroller unit (MCU) system security, because there is very little a hacker can do from inside of a partition that is fully isolated from the rest of the system. Achieving full isolation between processes using memory management units (MMUs) is relatively easy but requires power-hungry processors to achieve acceptable process switching times, and it is not appropriate at the task-level, anyway. Achieving full partition isolation for MCUs using memory protection units (MPUs) is possible, but comes with a high level of difficulty.
This is the first in a series of papers discussing how to achieve full partition isolation in MCU systems. Many papers have been written concerning MPUs. Ref. 1 is a particularly well-written introduction to the subject by Jean Labrosse, and I recommend that you read it ahead of this paper, as an introduction to MPU concepts. Refs. 2 and 3 are also helpful. The main problem with Ref. 1 is that it does not go far enough to achieve full partition isolation. However its contents are reviewed, in a few places in this series, in order to illustrate the consequences of different approaches to MPU usage and other aspects of partition isolation.
You may be a very good programmer. However, to write unhackable code, you should not assume that:
- You are smarter than the hacker, nor that
- you can hide flaws in your code from the hacker, nor that
- there is safety in small probabilities.
With regard to the latter, suppose there is a small flaw that you estimate can be broken only one time in a million tries. Safe, right? Wrong! For sake of discussion, suppose you have a slow buggy with only a 50 MHz clock and that can execute only about 10M instructions per second. Also suppose it takes about 100 instructions to attack your flaw. Then it will take the hacker about 1000 seconds, or 17 minutes to break in! So, you better fix it.
No doubt you can write small amounts of hacker-resistant code. But to write a whole system that way is clearly impractical – it would take too long and cost too much money. The bottom line is that we need a better methodology for writing embedded system code. Presenting such a methodology is the purpose of this series of articles.
Let’s Get Started
Figure 1 shows the structure of a typical embedded system. There is no structure and there are no partitions. If a hacker breaks in anywhere he has access everywhere – to the keys, to critical data, to everything. This is not good.
Figure 1: Typical Embedded System Structure (Source: Author)
Figure 2 shows a solution to the problem of safely adding networking to an existing, defenseless embedded system. It is shown here in order to discuss the basics of system partitioning – see Ref. 3 for more information on the solution.
Figure 2: Secure Network Solution (Source: Author)
Above the heavy line is umode (unprivileged or user mode, take your pick). Below the heavy line is pmode (privileged or protected mode). We call the heavy line the pmode Barrier, because it is enforced by the processor, and umode code cannot break through it. As shown, the Ethernet driver plus TCP/IP stack, both of which are notoriously vulnerable to hacking, are in a umode partition. This partition sees only encrypted data passing in or out. It connects to the Network Apps partition via a tunnel portal and it obtains system services via a Software Interrupt (SWI) interface, both of which severely limit what can be done from inside this partition. Note that only the Security partition connects to the Vault, which contains the keys, jewels, and other valuable property of the system owner.
In the above diagram, the entire embedded Application code has been grouped into a single pmode partition and thus it is also protected from an intrusion via the Internet. Figure 2 is a solution for legacy systems and is not typical for new systems, which are the main subject of this series of papers.
Advantages of Isolated Partitions
Dividing embedded system software into isolated partitions has many benefits:
- Protection from hackers.
- Higher reliability and safety.
- Isolation of low-quality or unknown quality software (SOUP).
- Better plug-in modularity.
- More disciplined design.
- On the spot detection of null and wild pointers and stack and buffer overflows.
- Easier incorporation of legacy software using isolated partitions.
- Partition reboot to recover rather than a full system reboot, which could interrupt vital operations.
- Support for partial updates of one or a few partitions.
Security and reliability are two sides of the same coin. In the first, hacks are deliberate; in the second, bugs and malfunctions are accidental. However, both can damage property and risk lives. Measures that improve one tend to improve the other. Hardware enforcement of full isolation enables interchangeability of modules within a system and better module reusability in future systems. Hardware enforcement of better design practices also helps to improve system security and reliability. Partial reboots and partial updates save time. These are all good reasons for partitioning.
The Need to Isolate Code As Well As Data
There seems to be some controversy concerning whether the code of each partition should be isolated from the code of other partitions, or not. For example Ref. 1 states:
Because of the fairly limited number of regions available in an MPU, regions are generally set up to prevent access to data (in RAM) and not so much to prevent access to code (in flash).
A competitive golfer must assume that his opponent will sink the putt, whatever the length. An analogous situation exists here – you must assume that a hacker knows where each of your functions is and what it does. Using this knowledge, he can wreak havoc upon your system simply by calling your functions at inappropriate times with inappropriate parameters. If, on the other hand, he can only access code within the partition that he has penetrated, then he can only damage that partition. Therefore, you do not want a hacker to be able to execute functions in other partitions. Code must be isolated in each partition, just like data.
The need for each partition to have unique data, code, and IO regions can easily exceed the number of MPU slots in most systems. We call this the MPU overflow problem. It is a big problem that we will address in next paper of this series. First, however, we will cover how to define partitions and how to manage the tasks within them.
Partitions typically are subsystems that perform specific functions, e.g. file systems, networking systems, etc. Application code can and should also be partitioned, of course. In a new system, we would like to see as much application, middleware, and driver code put into umode partitions, as possible. (As previously noted, this is not likely to be practical for legacy systems.)
A partition must contain at least one main task. It may contain other helper tasks. When defining a new partition, it is a good idea to list all regions that the partition will need. This may reveal an MPU overflow problem. One technique to deal with MPU overflow is to divide the partition regions among the main task and one or more helper tasks. For example, a driver task might be created and assigned the IO regions along with others that it needs; a portal task might be created and assigned the portal regions along with others that it needs. This might allow the main task to fit the rest of the partition regions into the MPU for its operation.
This might be the standard solution for the MPU overflow problem, except that tasks are SRAM-hungry. A typical task has a 100-byte, or so, Task Control Block (TCB), it may require 500 bytes or more for its stack, and it may need other blocks of RAM, as well. For acceptable performance, these must all come from on-chip SRAM, which unfortunately is in short supply in most MCUs. Some RTOSs offer lightweight tasks, which might be sufficient for helper tasks. For example, TCBs might be of variable size, so a task that requires very few services could have a much smaller TCB. It is also possible that through careful design the stack could be whittled down to perhaps as little as 200 bytes (50 words).
Another type of lightweight task is the one-shot task. This type of task does not have an infinite loop in its code. Instead, when dispatched, it receives a stack from a stack pool, it runs straight through its code, and then it stops and returns the stack to the stack pool. This is possible because it has no need to store information between runs. One-shot tasks are a good fit for helper tasks. For example, the driver task, referred to above, need run only when a peripheral operation is required; the portal task need run only when a portal operation is required. With careful design, it can be assured that only one of these tasks, and possibly others, can run at a time. Also, tasks in different partitions can securely share stacks from the same stack pool. Consequently many tasks may be able to share just a few stacks. Thus a large amount of SRAM can be saved.
There is a belief among some developers, that all tasks should be created during system initialization. This might improve security but it is not feasible in practice. For example, USB devices may be inserted or removed dynamically, and a task for each may be required by its class driver. Also there are USB controllers that can operate in either host mode or in device mode. Each requires its own set of tasks and code. Since tasks are SRAM-hungry, it is convenient to delete tasks for one mode and create tasks for the other mode when switching modes. It would surely ruin the professor’s experiment that is being run from a PC via USB to reboot in order to store some data on a thumb drive! So task deletion and creation must be done in situ.
There are many other cases where not allowing dynamic task creation and deletion would create severe implementation problems. It is not even acceptable to limit task operations to pmode, since USB stacks and similar code should be running in umode. On the other hand, allowing umode partitions to create, delete, and otherwise manipulate tasks does not seem like a good idea either.
The parent/child task concept shown in Fig. 3 provides a solution to this conundrum.
Figure 3: Parent/Child Tasks (Source: Author)
The basic principle is that a child task cannot do anything that its parent cannot do. Hence, the child task inherits all limitations (e.g. interrupt access and service call permissions) from its parent. In addition, it is limited to drawing its regions from the partition regions (shown as Partition Template in Fig 3), and it will likely have only a subset of its parent’s regions. A parent task can create or delete a child task; it can start it, stop it, and perform certain other task operations on it. A child task can also be a parent of its own child tasks. However, it cannot perform task operations on its parent, siblings, nor their children. From a security point of view, child tasks are no more than extensions of their parents.
It should be noted that partition main tasks are normally created in pmode and initially run in pmode because it is easier that way to initialize their partitions. Then main tasks restart themselves in umode and possibly go on to spawn child tasks and to complete the partition initialization in umode.
ptasks vs utasks
utasks provide a higher level of security than ptasks because the MPU cannot be changed from umode. In pmode, a hacker is only one instruction away from turning off the MPU and taking control of the system. However, ptasks have equal reliability protection to utasks, and it is not always possible to implement mission-critical functions in umode due to its lower performance and restrictions. In particular, all ptasks have sys_code and sys_data regions, which allow them to make direct, unfiltered calls for system services. As shown earlier in Figure 2, utasks must use the SWI mechanism for system services, which is much slower and more restrictive (services which could cause system damage are not permitted). In addition, interrupts cannot be disabled nor enabled from umode, which is essential for some low-level code.
Therefore, the importance of ptasks should not be overlooked. In fact, if all vulnerable partitions have been put into umode, then the pmode barrier should provide adequate protection from hacking for ptasks.
We have examined the need for and the advantages of partitioning embedded system software. In addition, we have explored the uses of pmode and umode and the relationship of tasks to partitions. However, this is only part of the story. In the next article in this series, we examine strategies and techniques to effectively manage MPUs and present solutions to the MPU overflow problem.
- Jean Labrosse, “Using A Memory Protection Unit With An RTOS”, embeddedcomputing.com, May 2018.
- Ralph Moore, “Is Your Thing in Danger?”, http://www.smxrtos.com/articles/thingindanger.htm, March 2021.
- Ralph Moore, “Where’s The Gold?”, http://www.smxrtos.com/articles/wheresthegold.htm, April 2021.
|Ralph Moore is a graduate of Caltech. He and a partner started Micro Digital Inc. in 1975 as one of the first microprocessor design services. In 1989 Ralph decided to get into the RTOS business and he architected the smx RTOS kernel. After 20 years of selling Micro Digital products and managing the business, he went back into product development. Currently he does the whole job from product definition, architecture, design, coding, debug, documentation, patenting, to promotion. Recent products include eheap, SecureSMX, and FRPort. Ralph has three children and six grandchildren and lives in Southern California.|
- Achieving MPU security
- A step-by-step process for achieving MPU security
- Understanding virtualization facilities in the ARMv8 processor architecture
- ‘Data diode’ hardens Industry 4.0 network security
- Firmware Security – Preventing memory corruption and injection attacks
- How to debug elusive software code problems without a debugger
For more Embedded, subscribe to Embedded’s weekly email newsletter.