OTA updates for Embedded Linux, part 1 – Fundamentals and implementation - Embedded.com

OTA updates for Embedded Linux, part 1 – Fundamentals and implementation

The need for updates

Once an Embedded Linux product leaves the lab and enters the real-world, the question of how to update the device will become important to consider.

Updates are not always necessary, but it’s hard to think of any software that does not have bugs that are discovered at some point. Even if your software is perfect, if the device communicates on networks or the internet with any open-source libraries, security updates may become a necessity.

Take the example of CVE-2104-01650 (Heartbleed). This vulnerability affected the OpenSSL cryptography library and by extension two-thirds of websites on the net. Even now, three years later, there are many Embedded Linux devices running an undefended version of OpenSSL, wide-open for attack.

Block vs file updates

When talking about updating Linux, you might see “block” and “file” update systems being mentioned. This refers to updating an entire partition at a time by writing straight to the block device or updating individual files to perform the update. You may be familiar with file update systems from Desktop or Server Linux (“sudo apt-get upgrade” for example).

In Embedded Linux block based upgrades are the way to go due to their atomicity and the fact that an entire filesystem is normally the output of an Embedded Linux build system. We expect the storage space on each embedded device to be constant for a particular product, so we create the same size partition each time. This type of update goes hand in hand with having some sort of fall-back or recovery image.

Recovery in event of failure

We never want the device to be left in an unusable state (if, for example, a power outage happens). We can solve this by ensuring that it is always possible to “fall back” to another partition should anything in the update process go wrong.                       

Figure 1. Recovery in event of a failure – fall back options (Source: ByteSnap)

Above you can see two possible implementations of a fallback mode in case of a power outage. On the left hand side, the Bootloader boots a rescue partition, which then boots into a main partition. On the right hand side, the Bootloader boots one of two partitions based on a switch.

The bootloader should implement some method of determining if a boot has been successful, and if it hasn’t it should return to the rescue partition (left-hand diagram), or the previous working partition (right-hand diagram).

The rescue method (left-hand) allows more space to be provided to the Main Partition, whereas the dual-rootfs method (right-hand) requires the space to be split up more or less evenly between the two partitions. If space is not an issue, then it is recommended to use the dual-rootfs method, only because it will result in less down-time. Updating via the rescue method requires two reboots, one into the rescue partition and then another back into the main. The dual-rootfs method only requires one reboot as update can be performed at any time.

What you can’t update securely in these systems is the bootloader (or indeed the rescue partition). If you want to be able to update the bootloader too, you would require two separate bootloaders partitions, and some sort of Board-Management-Controller to implement the logic of switching between the two.

Figure 2. Recovery in event of a failure – Board Management Controller (Source: ByteSnap)

This is of course a complicated solution, requiring an extra microcontroller, a new set of firmware, and a more complicated hardware design (it is used in some devices, those which contains a separate Intelligent Platform Management Interface (IPMI) controller, for example). Therefore, you should aim to build a bootloader that is functional, small in scope, and therefore doesn’t need to be updated.

U-Boot environment variables

U-boot implements a non-volatile “environment” in which variables can be stored. These can even be accessed from Linux (in various ways, depending on how the environment is stored, as detailed on elinux.org.

This is the most obvious way for implementing the “switch” described above. It can also be used to store information about previous boot successes or failures, so that in the event of a failure to boot the switch can be reversed and a working partition restored.

Figure 3. U-boot environment variables (Source: ByteSnap)

Setting up the watchdog

Your processor’s hardware watchdog should be setup by U-Boot (CONFIG_WATCHDOG) and then maintained by Linux once boot has completed. This will cause a reset in the event of the entire system hanging.

Checking for boot failures

Once your mission-critical application is running, it should set a variable in the u-boot environment signalling a finished boot. U-boot will then be able to check that this has been set on the next boot and take action if booting has failed (sometimes only after several failures in a row).

The exact architecture of this will depend on your application and product; you will want to customise this a little bit to suit your needs. You will want to determine all possible failure modes and implement recovery for all of them.

Implementing the update

As we said before the update should come as a single cryptographically signed file. The private-key signature ensures its origin from you, the manufacturer. Now the system just needs to unpack it and run a script within that will perform the update itself. It will write over the partition that is to be updated; flick any switches that are needed and reboot. This should happen as quickly as possible to minimise downtime.

Securing the update

We want to make sure that the update file we give to the device is from us the manufacturer and not from someone else. To achieve this, the update file is signed with a private-key held by the manufacturers. The corresponding public key is the held on the device, which will verify any update file it is asked to perform an update with. If the file provided is not deemed to be valid, then the update will fail.

Getting the update

How the update arrives is another matter. There are four possibilities here:

  • The most obvious and simple one is that the update is applied by an engineer that has a root login onto the device. He runs the update script and the device is updated. This is fraught with security concerns, and is probably only suitable for systems in development, or systems that are used in engineering or industrial environments.

  • A physical medium is inserted to the device (USB stick) which contains the required update. The software on the board will automatically detect and install, either via a polling daemon or a udev rule.

  • Uploading an update file to an individual machine via some method (a web application for example).

  • Over-the-air updates, as described in the next section.

Over-the-air updates

Over-the-air (OTA) updates refer generally to devices that are updated from a central server through a secure channel. It generally refers to IOT devices, mobile phones, automobile ECUs, etc. In this article, I’ll be describing a type of update that can work on any device connected to the internet, and this may be via Wi-Fi (over-the-air), Ethernet (over-copper), or some other protocol.

Checking for an update

The first thing to do is to check for an update. A daemon process running on the device can send a request to a pre-determined server, providing its current version and hardware version. The server can then, based on that information, send a signed update file to the device for installation if necessary, or report that no update is available or needed.

Complexity can be added here in many ways, from only providing updates to a subset of devices based on various criteria, to full-encryption of the update files, to reporting of update status or other information back to the central server.

Off-the-shelf vs in-house solution

There are many off-the-shelf update mechanisms that can be integrated with your Embedded Linux System, without you having to re-invent the wheel as described above. A comparison of some of them can be found at yocto project.

These can take some time and effort to integrate with your current embedded Linux build system, but it may be less work than developing a custom method in-house, and it may be more robust as some of these projects have had many hundreds of hours put into them.

Reasons why you might not want an off-the-shelf solution:

  • You wish to customize things to your board at every level

  • You may have security concerns in taking in a large codebase that is still relatively new, and not so widely used or recognised

At ByteSnap Design, we provide full hardware and software solutions for a many different customers in a variety of different we have created both internal custom update systems alongside integrating off-the-shelf update systems such as Mender on a variety of different chipsets such as the NXP iMX and TI OMAP ranges. In the next article, we'll compare some available off-the-shelf update systems.

Ville Baillie  is a software engineer at ByteSnap Design, an award winning software and hardware consultancy. He is a graduate of Physics at Warwick University and now specialises in developing device drivers for Embedded Linux applications.

1 thought on “OTA updates for Embedded Linux, part 1 – Fundamentals and implementation

  1. “Fantastic article!nIn reality, it's always a one big mess to update the whole system over-the-air.nDuring all of my last IoT/embedded projects we wasted a huge amount of time to find a good and quick solution for OTA updates, while in most cases we jus

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.