A new Compute Through Power Loss (CPTL) tool suite and library developed at Texas Instruments’ Kilby Labs enables the use of checkpointing on limited resource 16-bit MCUs. When an MCU’s operation has been interrupted by a power loss or error condition, the tool’s save-and-restore algorithm restores an MCU’s hardware and software settings to their state before the interruption. The new tool was designed for use with TI’s ultra-low-power MSP430FR6972 MCU family with 2 KBytes of SRAM and 64 KB of non-volatile ferro-electric random access memory (FRAM). CTPL makes possible instantaneous wakeup with intelligent system-state restoration after an application unexpectedly loses power.
Checkpointing involves taking regular snapshots of the system state and storing them in internal or external non-volatile storage, typically a combination of flash memory and battery backed-up SRAM. When an error or voltage or current outage occurs, the system is able to return to its active state at the time the check point was made. If checkpoints are sufficiently frequent, the amount of re-computation required may be small enough for an MCU to complete execution before shutdown occurs.
Configuration of CPTL checkpointing on the MSP430FR6989 with a LaunchPad Development Kit.
While checkpointing and rollback are widely used in desktop and other large system applications, it has been less frequently used for embedded real-time systems, and then only in mission-critical applications where such interruptions could be disastrous. However, in the wirelessly connected Internet of Things sensor environment, because of battery and energy harvesting limitations most applications are mission critical and the ability to set checkpoints is becoming more important.
In a typical middleware approach to this problem, installed software allows dynamic monitoring of each MCU resource based on its current health, adjusting the kind of information it collects and how often it collects it based on the current perceived risk of the component's failing. A working resource might send only an occasional “heartbeat,” which lets management middleware know it's alive and well. When an application fails, it no longer responds to the heartbeat, and the middleware initiates a failover using the last good checkpoint.
Alternatively, some operating systems offer hooks into the kernel to monitor system parameters such as CPU usage, number of processes running, and available memory, allowing a developer to set thresholds for each parameter. Typically, monitoring takes place as an independent task that queries the operating system regularly to check the various thresholds in place. When a threshold is exceeded, the monitoring task triggers an event in the kernel.
The problem with such approaches is that they are too top-heavy for the limited resource designs typical of wireless sensor applications. While both approaches allow the developer to scale things down to a single type of interruption such as power loss, and to a very few timing constraints, it is like using an elephant gun to shoot a mouse. What is necessary is a narrowly focused checkpointing scheme that does not involve a lot of development time or eat up system resources and is confined to a narrow and specific set of failure conditions. TI’s CPTL library was designed to meet this need.