Dealing with structural and reset faults in embedded SoC designs -

Dealing with structural and reset faults in embedded SoC designs

In system-on-chip designs, as in most electronics systems, it is crucial to include reset capability. A reset clears any pending errors or events and returns a system to normal condition or an initial state. It is usually done in response to an error condition when it is impossible or undesirable for a processing activity to proceed and all error recovery mechanisms fail. The lack of a proper reset ability can render the device useless after a power loss or malfunction.

With the increased complexity of digital design, the reset architecture used in current nanometer-scale designs has become very complex. While implementing such complex architecture, the designer tends to make some basic mistakes in the implementation of such resets leading to metastability, which in turn results in functional failures in the system.

Metastability is the tendency for a digital electronic system to persist for an unspecified time in an unstable equilibrium or metastable state, where a circuit may not be able to settle into a stable '0' or '1' logic level within the time required for proper circuit operation. As a result, the circuit can act in unpredictable asynchronous ways, leading to “glitches” that cause system malfunctions.

In modern nanometer-scale SoC designs, implementation of a reset architecture is therefore a crucial part of any design. Designers need to consider each and every aspect to ensure that the system will not get any false reset trigger or become metastable/corrupted due to wrong reset design implementation.

This article discusses some basic structural issues in reset design and the metastable problems they cause, including: reset domain crossing, glitches at reset source due to combinatorial loops, combinatorial logic in the reset path, misapplication of synchronous resets, reset-syncrhonizer redundancy, and, finally, reset de-assertion due to uncommon clock paths. At the end of the discussion of each problem some solutions are proposed.

Reset domain crossing
In traditional sequential designs that operate synchronously, if the asynchronous reset of source register is different from the reset of destination register, during the reset assertion of start-point register the data input of destination register changes asynchronously. This path will then operate asynchronously and unpredictably without regard to the global clock although both source and destination register are in same clock domain. This condition – known as a 'reset domain crossing' – occurs where the resets of the launch and capture flop are different. It can cause metastability at the destination register during reset assertion of source register.

In this scenario the asynchronous reset assertion of start point register-C and of the destination register-A are different (Figure 1 ). Suppose that during reset assertion of register C the flop A is not in reset. In such a case, if some valid data transaction is going on at the input of register A, then the changes due to asynchronous reset assertions on start point register C can cause timing violation at destination register A. This will produce metastability.

Figure 1 : Reset domain crossing issue

Solution: In the timing diagram in Figure 1 , when some valid data transaction goes through C1, then rst_c_b gets asserted, causing C1 to change asynchronously (w.r.t clk ). As a result, meta-stability at QC1 may cause functional failure. To avoid this problem, in addition to using a synchronous reset, non-resettable flops or POR for the D1 flop it will also be necessary to determine if the source of the reset rst_c_b is synchronous. If it is, then it is safe to assume that considering the timing arc from C_CLR–>Q for setup-hold check from –>C_CLR–>C_Q1–>C1–>A_D can avoid metastability. in design. However usually – by default- C_CLR–>Q timing arcs are not enabled in library and as a result it they need to be explicitly enabled during timing analysis.

Use a two-flop synchronizer at destination (A) to avoid the propagation of metastability throughout the design.

Glitch at reset source due to combinatorial loop
In an SOC theglobal system reset is a combination of various reset sources in adevice (Figure 2 ) generated either by software or hardware: LVD reset,watchdog reset, debug reset, software reset, and loss of clock reset.All can be used to assert global system reset.

However, if theassertion of global reset caused by the assertion of any one of thereset sources is completely asynchronous but the reset generation sourcelogic is cleared by global reset, it is likely that there is acombinatorial loop in the design path which can produce a glitch at thatreset source. The propagation delay of this combinatorial path willvary with different processes, voltages, or temperatures, causingvariations in the glitch width. If combinatorial cells are used forreset assertion and de-assertion then it will also cause race conditionsin simulation.

Figure 2: Glitch at reset source (basic problem)

When the reset source SW_Q asserts it will cause the assertion of rst_b which is the global reset (Figure 2 ).If the global reset itself is used to clear the assertion of SW_Q reset, then it will produce glitches in design at the SW_Q output andthe global reset. Also, in simulation this will cause a race conditionbecause the assertion of reset source is trying to de-assert itselfthrough this combinatorial logic.

However, if the reset source(SW_Q ) is used asynchronously in reset state machine (SET/CLR input offlop) for global reset assertion, then the reset glitch might be able toreset the whole system by asserting global reset . This is because theglobal system reset de-assertion is not dependent on reset sourcede-assertion alone.

As a result, there can still be an issuewhen this reset source (with glitch) is used synchronously or at the Dinput of a flop. As the glitch width may not be stable at least for acycle, this will not be captured by the destination flop. Also, thisreset source can’t be used as clock ( pulse capture circuit) of anycircuit, as it may cause clock width violations.

Figure 3: Glitch at reset source (problem 2)

Figure 3 shows what happens when there is a glitch at the reset source SW_Q .Although there will not be any glitch at global reset output (rst_b ), ifthe glitch at reset source SW_Q is captured in some flop as a resetstatus event (at S ) or for some other purpose, it will cause a timingviolation/metastability condition or it might not get captured at all.

Solution: Other than avoiding situations such as that shown in Figure 2 , If the reset implementation is configured the way shown in Figure 3 then the designer should make sure the reset source (SW_Q in this case)is always used at SET/CLR input of a flop and not at D or CLK.

The best way to resolve this issue is to register the reset source before using it in the reset state machine (Figure 4 ).Although this will cause a clock dependency on global reset assertion,the trade-off is that the assertion of the internal reset(SW_Q ) willnot also assert when the clock is not present.

Figure 4: Registering the reset source

Also,the designer can stretch the de-assertion of SW_Q before using it indesign which will make the reset assertion independent of clock (Figure 5 ).

Figure 5: Stretch the de-assertion of SW_Q

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.