Dealing with SoC metastability problems due to Reset Domain Crossing
Metastability in design due to asynchronous clock domain crossing (CDC) is a well known problem. Industry standard advanced tools are available to catch such structural or functional issues in design.
However, CDC is not the only reason a signal becomes asynchronous with respect to the destination clock domain. In a sequential design, if the reset of source register is different from the reset of destination register even though the data path is in same clock domain, this will create an asynchronous crossing path and cause metastability at destination register. Referred to as Reset Domain Crossing (RDC), it occurs when the reset signals of launch and capture flops are different.
This article will review some of the conditions under which RDC occurs and propose some ways to deal with the problems that occur up front in the design phase.
The problem of reset complexity
With the increased complexity of digital design, the reset architecture has also become very complex. While implementing such complex architecture, designer tends to make some mistakes which can lead to meta-stability, glitches or other functional failure in system. For example, if the source flop makes an asynchronous transition to reset state due to its asynchronous reset assertion and the destination flop is in normal state, then the input to destination flop may change within set up or hold window of that flop which will lead to metastability.
In an SoC, the global chip reset function is used to combine various reset sources in the integrated circuit generated either by software or hardware, including: Power On Reset, Low Voltage Detect reset, Watchdog Timeout reset, Debug reset, Software reset, and Loss of Clock reset.
However, there is always some part of the system which is not in reset state even with assertion of global reset such as the time keeping functionality and calendaring features. Also, there is the situation in which particular reset status registers capture a local the reset event and cause a global system reset. Depending on various functional requirements, multiple resets are needed to reset various subsystems of an SoC, because if one of them starts operating asynchronously, that erroneous reset state can corrupt the other logic elements not in a reset state.
In such designs it is often necessary to decide whether to use synchronous or asynchronous reset in design. Synchronous resets are based on the premise that the reset signal will only affect or reset the state of the flip-flop on the active edge of a clock. In some designs, the reset must be generated by a set of internal conditions. A synchronous reset is recommended for these types of designs because it will filter the logic equation glitches between clocks.
But if we have a gated clock to save power, the clock may be disabled coincident with the assertion of reset. Only an asynchronous reset will work in this situation, as the reset might be removed prior to the resumption of the clock.
Identifying RDC structures in design
In the circuit below in Figure 1 both the flops have different reset sources namely async_rstA_b and async_rstB_b. The assertion of async_rstA_b while async_rstB_b not being asserted can cause metastability on destination flop and if the output of destination flop is further used down the line then there could be some functional failure.
Figure 1(a): Basic RDC Structure
Figure 1(b): Waveform for Basic RDC Structure
Source flop reset - physically superset
Consider a case (Figure 2) where we have two reset sources rst1 and rst2 at top. Both are “AND“ ed connected to reset pin of first flop D1 and the second flop’s reset is only connected to rst2.
We can say that reset sources of source flop are physically superset of reset sources of destination flop. So there can be situation when we have only first flop undergoing reset and injecting metastability at second flop due to asynchronous change of first flop’s output.
Metastability at module configuration
In a device there can be some functionality which should only reset during POR and should remain functional during global/warm reset. So the configuration registers of that functionality need to be intact during global reset. However while write to that particular register is happening if a warm reset occurs it can corrupt the contents of the register which can lead to any functional failure. Figure 3 depicts the scenario.
Figure 3(b): Waveform for RDC issue while programming configuration register
In this case the destination register (B[7:0]) is configuration register which can be reset only by POR. If the programmer is writing to this register and suddenly the source flop is reset then the destination flop may go metastable and can settle to an undefined or unwanted value.