Static timing analysis: bridging the gap between simulation and silicon - Embedded.com

Static timing analysis: bridging the gap between simulation and silicon

Static Timing Analysis (STA) not only acts as a connecting link between backend and frontend design activities, but more importantly helps in bridging the gap between simulation and silicon. STA is often misconstrued to be a magical solution to the meet timing requirements. While it is undoubtedly the responsibility of STA engineers to close the timing, it is equally important for the register transfer level (RTL) designers to avoid some conspicuous errors, which we refer to as architectural pitfalls for timing.

In this article we discuss AND-gate clock gating and OR-gate clock gating use cases, some obvious and some not-so-obvious, which can serve as a guide to designers to ensure that such situations are avoided upfront in the RTL stage and thus preclude the reiteration of timing closure activities from, let’s say, clock tree synthesis (CTS) and back to logical synthesis.

We conclude the paper with a case study of an odd-frequency divider circuit that has one implementation that yields correct results in RTL simulation and the necessary changes in the algorithm to ensure that it works well on silicon.

Clock gating
Clock gating is an integral architectural method to save dynamic power. While the backend tools are aware of the power dissipation, it is nevertheless a good design practice to insert clock gating cells upfront in the RTL. That’s because insertion of clock gating cells depends on the use cases and the intent. For example, if a clock gating either feeds a small number of flip-flops or feeds the clock to a critical IP which is expected to be operating most of the time, it makes little sense to add clock gating cells.

Clock gating cells incur additional dynamic power and area overheads, and designers must be aware that clock gating cells must only be done when the savings are expected to be more than the overhead. Apart from clock gating cells, sometimes simple gates like AND, OR, and NAND can also be used for clock gating. But these have eccentric timing requirements, and therefore it is necessary for RTL designers to use these cells discreetly after understanding the timing scenarios.

AND gate-based clock gating (Figure 1) is an example of AND gate-based clock gating. One input of the AND gate acts as the signal enabler, which is generally the output of a register. Another input is the clock. When the enable signal is 1’b1, the AND gate becomes transparent for the clock. However when the enable signal is 1’b0, the AND gate is ‘off’ and the clock is ‘gated’. While this circuit would work well in RTL simulations, in presence of delays that are discerned only during the backend phase of SoC design, the waveforms in Figure 1 show the potential issue of glitch at the output of the AND gate.

This places a constraint on the enable signal that it should change only during ‘low’ period of the clock. The problem could be solved if the register generating the enable was a negative edge-triggered flip-flop. While replacing this flip-flop with a negative edge-triggered flip-flop would solve the timing issue, doing so would alter the intended functionality of the circuit.

Figure 1: Potential issue with AND-based clock gating

OR gate-based clock gating (Figure 1) shows the usage of OR-Gate based clock gating. One input of the OR gate acts as the enable signal, which is generally the output of a register. Another input is the clock. When the enable signal is 1'b0, the OR gate becomes transparent for the clock.

However, when the enable signal is 1'b1, the OR gate is ‘off’ and the clock is ‘gated’. While this circuit would work well in RTL simulations, in the case of delays that are discerned only during the backend phase of SoC design, the waveforms in Figure 2 show the potential issue of a glitch at the output of the OR gate.

This places a constraint on the enable signal that it should change only during ‘high’ period of the clock. The problem could be solved if the register generating the enable was a positive edge triggered flip-flop. Just like the case with AND-gate based clock gating, while replacing this flip-flop with a positive edge-triggered flip-flop would solve the timing issue, doing so would alter the intended functionality of the circuit.

Figure 2: Potential Issue with OR-based clock gating

Case study: odd frequency dividers with a 50% duty cycle
Figure 3 shows the circuit for a divide-by-7 frequency divider. Note that ituses AND gate-based clock gating where the enable is being driven from apositive edge-triggered flip-flop.

Figure 3: Circuit for divide-by-7 with 50% duty cycle using gray encoding

Figure 4 shows the simulation results as generated from a frontend verificationtool. As depicted, the simulation results do not possess any glitch atthe output waveform. However, the timing results generated using astatic timing analysis tool would flag a clock gating hold violation.

Figure 4: Circuit for divide-by-7 with 50% duty cycle using gray encoding

The table in Figure 5 showsthe clock gating hold violation reported by the STA tool. To verify thesame path was simulated using a SPICE simulator, where actual delayscome into the picture. The waveform in Figure 5 shows a spike at the output signal, which is the glitch.

Figure 5: Simulation waveform for the circuit using a SPICE Simulator

The design algorithm to divide the clock uses a gray encoding (Figure 6 ).As soon as Q1, Q0 becomes {10}, the End-of-Conversion (EOC) signal isset. The modified approach, instead of detecting {10}, detects theprevious signal, which would be {100}. This signal is then pipelinedwith a Clock Gating cell and then fed to the OR gate, as shown in Figure 7 .

Figure 6: The original and the modified algorithm

Figure 7: Circuit according to the modified algorithm

The output waveform in Figure 8 shows that there is no glitch at the output, and hence this circuit would work well on silicon.

Figure 8: Corrected simulation waveform for the circuit using a SPICE simulator

Conclusion
Timinganalysis is an important step in the SoC design flow to ensure that thecircuit works well on silicon. RTL simulations generally do not takeany delays into account. While it is the responsibility of STA engineersto meet the timing, understanding of timing can aid the RTL designer towrite the RTL correctly, and hence avoid timing iterations. Feweriterations translate into shorter time to market, and hence greatergross margins.

Naman Gupta received his B.E. degree inElectronics and Communication from Netaji Subhas Institute ofTechnology, Delhi University in the year 2011. He is currently workingas a Design Engineer with Freescale Semiconductor, Noida, India. Hisprimary responsibilities include timing closure and constraintsdevelopment. His research interests include high speed, low power andprogrammable design architectures.

Rohit Goyal receivedhis B.E. degree in Electronics and Communication from National Instituteof Technology, Kurukshetra, India in the year 2011.He is currentlyworking with Freescale Semiconductor, Noida as a Design Engineer. Hisprimary responsibilities include FPGA Prototyping, Emulation andVerification. His research interests include high speed and low powerdesign algorithms and architectures.

4 thoughts on “Static timing analysis: bridging the gap between simulation and silicon

  1. Hi Acasprado,
    I believe by clock enable you mean using a multiplexer with it's select line as the clock enable?
    If yes, one could definitely use this. However, this approach does not result in any overall dynamic power savings because the flip-flops at the

    Log in to Reply
  2. Hi Pritkiy,
    One can add extra FF as long as the desired functionality of the circuit remains the same. But having said that, one cannot deny the benefits of using clock gating cells to save dynamic power. In modern complex designs with stringent power budg

    Log in to Reply
  3. Would not it be safer to add extra FF with enable, thus avoiding any possibility of glitches altogether? Gated clocks could be used when carefully controlled, but they are inherent evil in more complex designs.

    It is like producing C code by MISRA books i

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.