Designing low-power sequential circuits using clock gating - Embedded.com

Designing low-power sequential circuits using clock gating

With shrinking technologies, rapid multiplication of clock frequencies, and increasing emphasis on power reduction, low-power design is taking on a vital role. Design teams can no longer afford to worry only about isolation on big power domains. With most SoCs containing multiple sequential circuits, every little bit counts, thus making it all the more important to design efficient low power designs. These sequential circuits are predominantly used to design finite state machines (FSMs), clock dividers, and counters in modern day designs.

This article describes an efficient way to design low power sequential circuits with effective clock gating with the help of a multi-stage programmable Johnson counter that can be extended to support a wide range of dividing factors, while consuming lower dynamic power compared to conventional circuits.

The what and why of Johnson counters
Sequential circuits such as counters and registers are omnipresent in the modern day designs. In a typical design, all computer operations (arithmetic, logical, and memory processes) are executed synchronously with respect to the clock, which increases the count of sequential circuits in a design. With the cut-throat competition to save every mW in this era of mobile battery-operated devices, it is not only important to segregate the circuit into power domains and then switch them off as required, but also to save power in each and every sequential circuit.

The Johnson Counter provides special kinds of data sequences synchronously, which are essential in various important applications (e.g. D/A converter, FSMs, and clock dividers). With increased integration of multiple Intellectual property (IP) blocks in a SoC supporting multiple blocks running at different frequencies ranging from MHz to GHz, many clock dividers supporting multiple ratios are implemented at different hierarchies in the design.

In this article we describe a method of saving power by replacing multiple clock dividers with a multi-stage programmable Johnson counter system with effective clock gating, which can provide clock dividing factors ranging from 8 to any even value (38, presented in the current article).

Figure 1 depicts a conventional design of a 4-bit, positive-edge Johnson Counter. A Johnson counter is simply a modified shift register with inverted output of last D flip-flop being fed back as an input to the first D flip-flop. All other flops are provided the output of the previous flip flop.

Figure 1: Conventional Johnson Counter

As shown in Table 1 , all the columns have 4 consecutive 1’s followed by 4 consecutive 0’s, but all in a different phase. A Johnson counter creates a specific data pattern synchronously. This pattern makes it extremely useful in modeling, since by using any of the taps one could generate a clock-like pattern with many different phases. Also, as can be deduced from the table, the Johnson counter only utilizes N flops to provide 2N states, thus requiring only half the number of flip-flops compared to the standard ring counter for the same MOD.

Table 1: State table for conventional Johnson counter

The gaps in typical sequential circuits
The biggest disadvantage of the circuit shown in Figure 1 is the non-configurability of the circuit to change the clock division factor. An N-flop design can only produce a clock of 2N, period. A fixed number of flops needs to be added to the design beforehand to output a clock of a fixed period. This restrains the design to a specific clock, and multiple such designs are required to scale down to multiple ratios.

The design is not power-efficient and does not provide a mechanism to save the dynamic power by efficient clock gating .For example, it can be seen in Table 1 that Q3 changes its output only in Clock Pulse 2 and Clock Pulse 6. For all other clocks the flop is storing the same data again and again. This leads to unnecessary power dissipation in the clock cycles, which could have been saved by the proper clock gating.

Restructuring for effective clock gating
Any sequential circuit can be enhanced by reconstituting and effective clock gating. Similarly the Johnson counter shown in Figure 1 is enhanced in Figure 2 to support flexible division factors to produce variable output frequencies.

To make it configurable, multiple delay stages of flops are added, with the required combination logic to select according to the required division factor.

Figure 2 represents the Power Efficient Configurable Johnson Counter circuit. The above circuit includes the cascaded delay stages B1, B2, B3, B4, an inverter I, a reference clock input CLK, clock gating logic CGL, and a control logic (divider and subtracter) to select the combination of flip-flops required.

Figure 2: Low power multi-stage programmable Johnson Counter

Figure 2 represents the Power Efficient Configurable Johnson Counter circuit. This circuit includes the cascaded delay stages B1, B2, B3, B4, an inverter I, a reference clock input CLK, clock gating logic CGL and a control logic (Divider and Subtractor) to select the combination of flip-flops as per the requirement.

In the proposed modified Johnson Counter circuit, we have employed 19 D flip-flops which can achieve any even division factor ranging from 8 to 38. The division factors required can be further increased to any even value with the addition of flops and multiplexers. Multiple paths can connect the outputs of flops a, j, o and r to corresponding input of Multiplexers, e.g. Shunt path connects the output of flop a to first input of 1st Mux and delay path connects the output of flop a passed through an array of flops (b, c, d, e, f, g, h, i) to second input of 1st Mux .This implementation of selecting multiplexer output gives this circuit the much needed configurability to support multiple division factors.

As shown in Figure 3 , in order to save power, control circuit output is fed to CGL to enable or disable the clocks of delay path flops in accordance with the intended division factor. Given a division factor of 2N, N number of flops are required to achieve the desired clock frequency. To facilitate this selection of the input of multiplexers and enable for the clock gating logic, a control logic consisting of mainly a subtractor has been incorporated. The subtractor, based on the division factor provided by the user, gives N-4 as the output and each of the bits of the binary output of the subtractor (sel[3:0]) works as the corresponding select lines for the 4 multiplexers (1st, 2nd, 3rd, 4th) and enables of CGL, which efficiently gates the clocks of the unused flops.

This makes the design programmable while reducing the dynamic power consumption of the counter.

Figure 3: Flowchart explaining the circuit operation

A Jackson counter at work
Let us consider an example for a division factor of 10, i.e. 2N=10 (Figure 4 ).Since a typical Johnson Counter requires N flops for a division factorof 2N ,to achieve a division factor of 10, 2N/2 = 10/2 = 5 number offlip-flops are needed in the circuit. Divider circuit will give theoutput 2N/2 = 5, which makes the subtractor to output (5-4) = 1 to feedthe select lines of multiplexer, the binary representation of which is0001. This 4-bit sel[3:0]=0001 signal is the most crucial signal, sinceit will not only control the clock gating logic but also select the pathbetween the shunt and delayed paths.

Figure 4: Circuit operation for a division factor of 10

Onlysel[0] will be 1 in this case. This will enable the clock of sflip-flop and similarly sel[3],sel[2],sel[1] will disable the clocks of(b, c, d, e, f, g, h, i), (k, l, m, n), (p, q) flip-flopscorrespondingly. This is depicted with the highlighted portion in Figure4. Also note that flops a, j, o, and r will always be enabled. Thisenables the required flops and the circuit can achieve the desiredoutput clock at the output of the 4th multiplexer. Thus a total of 5flops will receive the clocks in this example, and the clocks of otherflops will be disabled intuitively.

The above counter was simulated and the results in the form of RTL waveforms are shown in Figure 5 . As can be deduced, the modified counter divided a clock of 100 MHz to give an output of 10 MHz using sel[3:0] as 4’h0001.

Fig 5: Waveform for the division factor of 10

Thevarious combinations that can be achieved by the proposed circuit andthe inputs selected by the multiplexers are listed in the table below (Table 2 ).

Table 2: Selection logic for Multiplexers and CGIC based on the division factor

TheJohnson counter we describe is configurable for a division factor from aminimum of 8 to 38, thus providing an array of output frequencies, asconfigured by the inputs provided to the combinational logic of thecounter.

Even with the extra hardware utilized in the counter toprovide the programmability, the power dissipation of the circuit iscontrolled with the effective clock gating provided by the same logicwhich selects the multiplexer in selection stages and acts as an enablefor Clock Gating Cell.

Thus, with the clock gating added in thedesign, any sequential logic from a shift register to a counter can bemade more efficient. A collection of such circuits in SoCs can savepower and increase the battery lifetime of the device.

Conclusion
Thestringent power requirements and increasing multiplication factorsprovided by the architects during the design phase leads to arequirement for multiplexed cascaded clock dividers which makes thecircuit dissipate more power and consume more chip area.

Therestructured design we propose provides an easy solution for supportingvariable output frequencies while consuming lower dynamic power comparedto conventional circuits. The solution can easily be extended to otherdesigns to make them more power-efficient.

Bhanu Khera is adesign engineer in Verification domain at Freescale Semiconductor IndiaPvt Ltd, Noida for more than 2 years. She has been working on core andsecurity architecture verification in multiple SoCs. She can be reachedat .

Harsh Garg is working with Freescale Semiconductor India Pvt Ltd, Noida, as aDesign Engineer and has 2+ years of experience. He has been working onverification of critical IPs like USB, COP (Common On-Chip Processor)and Low Power Verification of SoC. He can be reached at harsh.garg@freescale.com .

References
1. Clock gated low power sequential circuit design
2. Design of low power sequential circuit using clocked pair shared flip flop
3. Low power sequential circuit design using priority encoding and clock gating
4. Wang et. al. “Multi Stage Programmable Johnson Counter”: US Patent No. 6876717
5. A Design Scheme of Toggle Operation Based Johnson Counter with Efficient Clock Gating
6. Low power design of Johnson Counter using clock gating
7. The Johnson Counter

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.