Updating firmware reliably
Many devices still treat firmware updates as exceptional events, to be done in exceptional circumstances and only by advanced users or qualified personnel. However, especially for connected devices, keeping up to date is becoming ever more important. Keeping the device secure requires being able to update the software running on it, quickly and reliably, because there will be vulnerabilities discovered.
But nobody likes bricked devices. Sure, a bricked device is perfectly secure, but that is hardly a consolation. Therefore, when performing firmware updates, especially delivered over the air and in the background, there is one overarching concern: reliability. At every step of the process there must be safety mechanisms that allow for device to be recovered with no or minimal intervention from the end user (a manual reboot is fine, an RMA for reflashing is not).
These are the principles we set out for ourselves when designing a firmware update mechanism for Mongoose Firmware. In this article we will talk about performing reliable firmware updates in general and consider particular implementation used on the TI CC3200 (but the same applies to ESP8266).
TI CC3200 is an unusual device in that it does not have any on-chip flash memory. Code and data sections are loaded from an external SPI flash into SRAM and executed from there. The SPI flash chip is formatted to contain a rudimentary file system. The size of the flash chip can be up to 64 Mb, but the most popular size is 8 Mb - as seen on both the LAUNCHXL dev board and in the CC3200MOD, the module offered by TI.
At first glance, code not being executed from flash directly makes things easier - files on the SPI filesystem can be deleted and rewritten without interfering with code being executed, so code could be updated while it is running. This, however, is extremely unsafe - if the process were to fail for any reason (interrupted network connection or sudden loss of power), the device would not be able to boot at all, let alone roll back the update. Thus, first order of business is creating a mechanism for maintaining an alternate image of the firmware, with ability to boot from it.
The solution is a boot loader: a small piece of code that rarely, if ever, changes.Its role is to determine which of the firmware images to load and execute. This is usually specified by a configuration block, found at a predetermined location. Since CC3200 does not allow direct access to flash, this will need to be a separated file rather than a flash sector. Boot config specifies which image should be loaded - see figure 1.
Figure 1. (Source: Cesanta)
There is one more consideration, however: if the boot config is stored in only one location, it makes it susceptible to failure during updates, which are usually performed as a read-erase-write operation: a reboot after erase and before write is complete could render device unbootable. The time between the two is short, but we set out to make our update process safe at all points, so we have to deal with it. The way we do it by using two config files with versioning, or sequencing. A sequencer is a monotonically decreasing number, so of the two files the one with smaller sequencer is more recent - on figure 2, config 1 is selected as active because it has smaller sequencer. When writing a new config file, we always use the currently inactive (older) slot and it will not become newer until it is written - erased config will be older than any valid one because erased NOR flash is filled with all 1s.
Figure 2. (Source: Cesanta)
Next we consider rollbacks. What if the firmware is bad and does not work? For example, it may just reboot or hang and be rebooted by watchdog timer. Thus the first boot after an update is unconfirmed: newly-flashed firmware image has the “first boot” flag set. Figure 2 illustrates the state after update was first applied: it was downloaded and flashed to slot 1 and boot config 1 was written with “first boot” flag set. At an early stage of the boot process, boot config with “first boot” set is erased and the boot process continues. If at any point the system reboots, the bad config that led to failed boot will not be there and a rollback will occur - the system will boot from the previous, good configuration (cfg 0 and image 0 in fig 2). If the boot is successful and update is committed, the “first boot” flag is removed from the config and from now on it will always be used as active firmware.
However, what if the firmware initializes successfully, but fails to function on a higher level - e.g. fails to establish network connection or perform some similar higher level function? The point is, it may not always be possible for firmware itself to tell if it’s OK or not, and an external confirmation is desirable. Mongoose IoT firmware supports this by way of a commit timeout: a kind of watchdog timer, that will automatically revert an update if it is not explicitly committed within certain time. It can be done in multiple ways: by explicit invocation of an API function from user code, by external process sending an HTTP request to device’s /update/commit URI or by having firmware poll a commit URL after update - a successful response will tell device to commit the firmware. We found this very useful when recovering devices from bad updates - so convenient, in fact, that it is possible for us to do significant part of development on devices that are deployed in the field via OTA with delayed commits.
And now, let’s consider update delivery. There are multiple approaches of course. We chose and implemented two in Mongoose Firmware: push-based update delivery via HTTP POST and poll-based delivery by having device check specific URL for updates at regular intervals. The former is best suited for lowest latency and when the device is directly accessible (e.g. during development); the latter is best suited for production, when a fleet of devices needs to be updated. In the latter case, the server responding to update requests can be configured to perform targeted updates and staged rollout.
Let me show you how this works in practice. We will use our “Hello, world!” example.
So, let’s build and run it to establish the baseline (you will need to register on the Mongoose Cloud to get your own username and password). Note: in the following examples, for simplicity, we are using HTTP. Real setup should use HTTPS with proper certificate validation (which is supported).
rojer@nbt:~/cesanta/mongoose-iot/fw/examples/c_hello master$ miot build --arch cc3200
Connecting to http://cloud.mongoose-iot.com, user cesanta
Uploading sources (1734 bytes)
Success, built c_hello/cc3200 version 1.0 (20161114-113755/???).
Firmware saved to build/fw.zip
rojer@nbt:~/cesanta/mongoose-iot/fw/examples/c_hello master$ miot flash && miot console
Loaded c_hello/cc3200 version 1.0 (20161114-113755/???)
Connecting to boot loader..
Main boot loader v220.127.116.11
cc3200_init c_hello 1.0 (20161114-113755/???)
cc3200_init Mongoose IoT Firmware 2016111411 (20161114-113755/master@e2a4f704)
cc3200_init RAM: 122260 total, 109044 free
start_nwp NWP v18.104.22.168 started, host driver v22.214.171.124
cc3200_init Boot cfg 0: 0xfffffffffffffffe, 0x0, c_hello.bin.0 @ 0x20000000, spiffs.img.0 (2)
fs_mount_idx Mounting spiffs.img.0.0 0xfffffffffffffffe
mg_sys_config_init MAC: F4B85E49A7B3
mg_sys_config_init WDT: 15 seconds
clubby_channel_uart 20025edc UART0
mg_wifi_setup_ap AP Mongoose_49A7B3 configured
mg_sys_config_init_http HTTP server started on 
Hello, world! (3)
Hey, a file!
cc3200_init Init done, RAM: 104664 free
mg_wifi_on_change_cb WiFi: ready, IP 192.168.4.1
A few key things in the log above:
- Is output of the boot loader. It’s terse, but if boot fails, it is possible to tell at what stage.
- Logs the contents of the boot config used to boot this firmware: sequencer, flags, image name, load address and SPIFFS container image name (not covered here, see our blog post on the subject).
- Is the output of our example’s mg_app_init function. After that you see output of the timer callback and it should be accompanied by blinking of the red LED (on the LAUNCHXL board).
By default, the board starts up in the AP mode. To make our life easier, let’s instead make it join a WiFi network:
$ miot config-set wifi.ap.enable=false wifi.sta.enable=true wifi.sta.ssid=Cesanta wifi.sta.pass=*** && miot console
Setting new configuration...
mg_wifi_connect Connecting to Cesanta
mg_wifi_on_change_cb WiFi: ready, IP 192.168.1.33
The device rebooted and is now connected to network. Now let’s build a new firmware and push an update. Keeping the console attached, switch to a different window. Make a change to src/main.c to print something distinctive on the console. I added a simple counter and modified statements in the timer callback to print it. Then update version miot.yml to 1.1 and build.
$ miot build --arch cc3200
Connecting to http://cloud.mongoose-iot.com, user cesanta
Uploading sources (1756 bytes)
Success, built c_hello/cc3200 version 1.1 (20161114-135843/???).
Firmware saved to build/fw.zip
So, there we have our v1.1. Now instead of flashing directly, let’s perform an update.
Configuration page at http://192.168.1.33/ has a firmware update form at the bottom, select build/fw.zip and press “upload”