Hitless I/O: Overcoming challenges in high availability systems
High availability systems such as servers, communication gateways and base stations need to be continuously operational. Once installed in the field, software upgrades handle feature enhancements and bug fixes. As a result, these systems are designed in such a way that their functionality can be updated without interrupting its normal operation. Programmable Logic Devices (PLDs) are commonly required to support in-system design updates. The improved design convenience and excellent performance at a lower cost make PLDs ideal as board hardware management devices in these systems, where they manage on-board DC-DC converters, monitor and control critical signals, aggregate serial communications, and perform other housekeeping functions.
The Indispensable PLD
A PLD consists of a number of programmable function units. These units are configured and interconnected to implement board specific hardware management functions. Typically, a software design tool converts a given logic function, such as a board hardware management, into a PLD-specific configuration bit stream, which configures the program function units and interconnects them. The configuration bit stream is stored within a PLD’s on-chip configuration Flash memory. When the board is powered on, the contents in the configuration Flash memory are automatically transferred to its on-chip configuration SRAM, which in turn configures the programmable function units to perform the desired hardware management task. To update the hardware management functionality, a different bit stream is loaded into the configuration Flash memory, in background, at any time without interrupting hardware management functions performed by the PLD. To transfer the newly stored Flash memory configuration Flash onto the on-chip SRAM the board is power cycled, interrupting the system normal operation (Fig.1).
Figure 1. Most PLDs must be power cycled to reconfigure (Source: Lattice Semiconductor)
Maintaining steady outputs during configuration
High Availability systems cannot tolerate interruptions through power cycling. Because the PLD I/Os are used to enable the DC-DC converters and control reset signals for the main ASICs and CPUs on the board, the outputs of the PLD should not toggle during reconfiguration. Holding the outputs steady during the PLD reconfiguration offers many challenges.
Figure 2: PLD reconfiguration steps using MachXO2/MachXO3 without Hitless I/O (Source: Lattice Semiconductor)
Lattice’s MachXO2 or MachXO3 PLD series include features that enable zero-downtime updates (Fig.2). First, the PLD undergoes a "background update" loading new configuration data into its configuration Flash memory via JTAG, SPI or I2C. Once the upload is complete, a “TransFR” command is issued to move the new PLD image from configuration Flash memory to the PLD's configuration SRAM. Invoking the “TransFR” command also triggers a "Leave Alone" function, which ensures that all I/O values are held in their last known value during the transfer. Lastly, during the “logic initialization” step, the state machines will begin to restart the Power Management and reset distribution functions. This will result in turning supplies off, forcing the board to undergo power recycling.
In order to support zero-downtime updates, the PLD device must be able to hold the outputs controlling the supplies and other logic control signals unchanged, while the state machines created by the new image are undergoing initialization. After the new algorithms initialize, they should take over the control of supplies and other logic signals.
To hold the critical I/Os unchanged during the initialization process, ‘Hitless I/O’ elements are added to the user design. As shown in Figure 3, this involves adding a latch-mux to every critical output. The latch-mux holds the outputs in their last known value during the state machine initialization process and hands the output control back to the state machine after its initialization process is complete. The circuit can differentiate between normal (power on) start-up and after a reconfiguration event using a separate “Hitless_IO_Enable” input, preventing an I/O lock of critical outputs during a normal power-on sequence.
Figure 3 illustrates the role of Hitless I/O in the state machine’s initialization process, after the new configuration is loaded into the MachXO2/MachXO3 device configuration SRAM.
Figure 3: Hitless I/O holding the critical I/O states in their last known state during initialization (Source: Lattice Semiconductor)
A MUX-Latch is added to every output that needs to remain unchanged by holding the output at its current value as long as its MUX control input is at “0”. This means that the DC-DC converter remains “on” (if it was previously on) regardless of the state machine output status. When the control signal is at logic “1” the DC-DC converter status is controlled by the state machine directly. The state machine controls the MUX output through the ‘Normal Operation’ node. An external input signal ‘Hitless_IO_Enable’ is added to the design to differentiate between normal “power on” configuration (when the DC-DC converter outputs are controlled during state machine initialization process) and hitless reconfiguration process (when the DC-DC converters are not changed during the state machine initialization process).
Let’s assume that the “Hitless_IO_Enable” signal controlling the hitless process is set to “1”.Before initialization, the state machine resets the ‘Normal Operation’ signal to ‘0’. The MUX-Latch will ignore the outputs from the state machine and the DC-DC converter “Enable” signals are left unchanged. When the PLD's logic is ready to resume normal operations, it sets the "Normal Operation" signal to a logical "1" (high), allowing it to assert control over the DC-DC converters. The board's DC-DC converters and resets are now controlled by the updated power and reset control state machine.