OTA updates for Embedded Linux, part 1 – Fundamentals and implementation
The need for updates
Once an Embedded Linux product leaves the lab and enters the real-world, the question of how to update the device will become important to consider.
Updates are not always necessary, but it’s hard to think of any software that does not have bugs that are discovered at some point. Even if your software is perfect, if the device communicates on networks or the internet with any open-source libraries, security updates may become a necessity.
Take the example of CVE-2104-01650 (Heartbleed). This vulnerability affected the OpenSSL cryptography library and by extension two-thirds of websites on the net. Even now, three years later, there are many Embedded Linux devices running an undefended version of OpenSSL, wide-open for attack.
Block vs file updates
When talking about updating Linux, you might see “block” and “file” update systems being mentioned. This refers to updating an entire partition at a time by writing straight to the block device or updating individual files to perform the update. You may be familiar with file update systems from Desktop or Server Linux (“sudo apt-get upgrade” for example).
In Embedded Linux block based upgrades are the way to go due to their atomicity and the fact that an entire filesystem is normally the output of an Embedded Linux build system. We expect the storage space on each embedded device to be constant for a particular product, so we create the same size partition each time. This type of update goes hand in hand with having some sort of fall-back or recovery image.
Recovery in event of failure
We never want the device to be left in an unusable state (if, for example, a power outage happens). We can solve this by ensuring that it is always possible to “fall back” to another partition should anything in the update process go wrong.
Figure 1. Recovery in event of a failure - fall back options (Source: ByteSnap)
Above you can see two possible implementations of a fallback mode in case of a power outage. On the left hand side, the Bootloader boots a rescue partition, which then boots into a main partition. On the right hand side, the Bootloader boots one of two partitions based on a switch.
The bootloader should implement some method of determining if a boot has been successful, and if it hasn’t it should return to the rescue partition (left-hand diagram), or the previous working partition (right-hand diagram).
The rescue method (left-hand) allows more space to be provided to the Main Partition, whereas the dual-rootfs method (right-hand) requires the space to be split up more or less evenly between the two partitions. If space is not an issue, then it is recommended to use the dual-rootfs method, only because it will result in less down-time. Updating via the rescue method requires two reboots, one into the rescue partition and then another back into the main. The dual-rootfs method only requires one reboot as update can be performed at any time.
What you can’t update securely in these systems is the bootloader (or indeed the rescue partition). If you want to be able to update the bootloader too, you would require two separate bootloaders partitions, and some sort of Board-Management-Controller to implement the logic of switching between the two.
Figure 2. Recovery in event of a failure – Board Management Controller (Source: ByteSnap)
This is of course a complicated solution, requiring an extra microcontroller, a new set of firmware, and a more complicated hardware design (it is used in some devices, those which contains a separate Intelligent Platform Management Interface (IPMI) controller, for example). Therefore, you should aim to build a bootloader that is functional, small in scope, and therefore doesn’t need to be updated.
U-Boot environment variables
U-boot implements a non-volatile “environment” in which variables can be stored. These can even be accessed from Linux (in various ways, depending on how the environment is stored, as detailed on elinux.org.
This is the most obvious way for implementing the “switch” described above. It can also be used to store information about previous boot successes or failures, so that in the event of a failure to boot the switch can be reversed and a working partition restored.
Figure 3. U-boot environment variables (Source: ByteSnap)