Dealing with SoC metastability problems due to Reset Domain Crossing

Metastability in design due to asynchronous clock domain crossing (CDC) is a well known problem. Industry standard advanced tools are available to catch such structural or functional issues in design.

However, CDC is not the only reason a signal becomes asynchronous with respect to the destination clock domain. In a sequential design, if the reset of source register is different from the reset of destination register even though the data path is in same clock domain, this will create an asynchronous crossing path and cause metastability at destination register. Referred to as Reset Domain Crossing (RDC), it occurs when the reset signals of launch and capture flops are different.

This article will review some of the conditions under which RDC occurs and propose some ways to deal with the problems that occur up front in the design phase.

The problem of reset complexity
With the increased complexity of digital design, the reset architecture has also become very complex. While implementing such complex architecture, designer tends to make some mistakes which can lead to meta-stability, glitches or other functional failure in system. For example, if the source flop makes an asynchronous transition to reset state due to its asynchronous reset assertion and the destination flop is in normal state, then the input to destination flop may change within set up or hold window of that flop which will lead to metastability.

In an SoC, the global chip reset function is used to combine various reset sources in the integrated circuit generated either by software or hardware, including: Power On Reset, Low Voltage Detect reset, Watchdog Timeout reset, Debug reset, Software reset, and Loss of Clock reset.

However, there is always some part of the system which is not in reset state even with assertion of global reset such as the time keeping functionality and calendaring features. Also, there is the situation in which particular reset status registers capture a local the reset event and cause a global system reset. Depending on various functional requirements, multiple resets are needed to reset various subsystems of an SoC, because if one of them starts operating asynchronously, that erroneous reset state can corrupt the other logic elements not in a reset state.

In such designs it is often necessary to decide whether to use synchronous or asynchronous reset in design. Synchronous resets are based on the premise that the reset signal will only affect or reset the state of the flip-flop on the active edge of a clock. In some designs, the reset must be generated by a set of internal conditions. A synchronous reset is recommended for these types of designs because it will filter the logic equation glitches between clocks. 

But if we have a gated clock to save power, the clock may be disabled coincident with the assertion of reset. Only an asynchronous reset will work in this situation, as the reset might be removed prior to the resumption of the clock.

Identifying RDC structures in design
In the circuit below in Figure 1 both the flops have different reset sources namely async_rstA_b and async_rstB_b . The assertion of async_rstA_b while async_rstB_b not being asserted can cause metastability on destination flop and if the output of destination flop is further used down the line then there could be some functional failure.


Figure 1(a): Basic RDC Structure

Figure 1(b): Waveform for Basic RDC Structure

Source flop reset – physically superset
Consider a case (Figure 2 ) where we have two reset sources rst1 and rst2 at top. Both are “AND“ ed connected to reset pin of first flop D1 and the second flop’s reset is only connected to rst2 .

Figure 2: Basic RDC structure: physically superset

We can say that reset sources of source flop are physically superset of reset sources of destination flop. So there can be situation when we have only first flop undergoing reset and injecting metastability at second flop due to asynchronous change of first flop’s output.

Metastability at module configuration
In a device there can be some functionality which should only reset during POR and should remain functional during global/warm reset. So the configuration registers of that functionality need to be intact during global reset. However while write to that particular register is happening if a warm reset occurs it can corrupt the contents of the register which can lead to any functional failure. Figure 3 depicts the scenario.

Figure 3(a): RDC issue while programming configuration register

Figure 3(b): Waveform for RDC issue while programming configuration register

In this case the destination register (B[7:0]) is configuration register which can be reset only by POR. If the programmer is writing to this register and suddenly the source flop is reset then the destination flop may go metastable and can settle to an undefined or unwanted value.Glitch in clock due to asynchronous reset

Figure 4: Clock Glitch at Clock Gating cell output

Synchronous memory corruption during asynchronous reset
Systemsynchronous memories are expected to retain its content across a warmreset. But due to reset domain crossing issue there is a highprobability of memory data corruption. Although memory does not getreset during warm reset but the memory controller logic, which iscontrolling the memory can go to reset state asynchronously. As shown inFigure 5 , during this time if the memory was enabled (memorychip select was enabled and some memory write operation was going on)its synchronous interface will have timing violation and will getcorrupted.

Figure 5(a): Memory corruption due to RDC

Figure 5(b): Waveform for Memory corrupt

In the above timing diagram in Figure 5(b) the warm reset (RST_B ) asserts around an active clock edge which changes the memory controller output chip_select, address, write_data, write_enable asynchronously, causing timing violation at memory interface and results in data corruption.

Circuits to mitigate RDC issues
Dependingon functional requirements, solutions will vary. But following are somebasic structural solutions. But be warned: they are not always valid inall scenarios, so the designer will need to judge which is appropriateto his or her specific situation.

Using Synched output of destination flop. Using a simple two flop synchronizer (Figure 6 ) can sometimes solve the problem, by ensuring that metastability occurring at destination flop S1 will be blocked by S2 and won’t propagate to the rest of the design.

Figure 6: Sync Flop structure

Using synchronous Resets. Using reset synchronously throughout the design will ensure nometastability in the design is due to RDC. This ensures that there is noasynchronous path due to reset assertion since a reset signal is usedin the data path where timing parameters are satisfied.

Matching reset to Q arc timing.  Shown below in Figure 7 is a situation in which the rst1_b is used on source flop and rst2_b on destination flop as the respective reset sources. If rst1_b is asserted and we have ensured that the timing path throughR_QàF_rbàF_QàC (combo) à S1_D is met with respect to the destinationclock. As a result there will be no metastability due to reset domaincrossing. To meet this timing requirement RBàQ arc of the flop F needsto be enabled during timing analysis.

Figure 7: Meeting reset to Q timing

Summary
Inthis article we have focused mainly on RDC structural issues and ourrecommendations on how to deal with them. However they are not t are notthe only ones nor are they universally true for all designs. We haveprovided some generic solutions which may require modification inspecial cases, such as where such issues have not only caused functionalfailure but increased execution cycle time by adding some extra debugtime and effort. Hence it is very important for a designer to take careof such issues at very early stage of design.

Arjun Pal Chowdhury is Lead Design Engineer at Freescale Semiconductor. He has been workingwith Freescale and has 7 years of experience in SoC Design andArchitecture and is involved in designing chips which goes intoAutomotive as well as Industrial and Multimedia Market.

Neha Agarwal is Senior Design Engineer at Freescale Semiconductor. She has beenworking with Freescale from last 3 years in SoC Design and Architectureand is involved in designing chips which goes into Automotive Market.Graduated from Birla Institute of Technology, Mesra in year 2009.

Ankush Sethi is a member of the SoC architecture and front end integration team atFreescale India. He received his Bachelors degree in Engineering (inElectronics & Communication division) in 2012 from Netaji SubhasInstitute of Technology (NSIT), affiliated with the University of Delhiin India. He joined Freescale in June 2012 directly upon graduation.

2 thoughts on “Dealing with SoC metastability problems due to Reset Domain Crossing

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.