Using multi-bit flip-flop custom cells to achieve better SoC design efficiency - Embedded.com

Using multi-bit flip-flop custom cells to achieve better SoC design efficiency

System-on-chip (SoC) designs are becoming more and more complex, by whatever means you measure it: power domains, gate count, packing densities, heat dissipation capacities, etc. At such high packing densities, it has become more challenging for physical design teams to make practical trade-offs in performance, power consumption, and die area. More robust techniques and circuit elements are needed to strike the best balance between these elements. In general, the three elements are complimentary, with controlling power and area usually resulting in higher performance.

However, keeping power and area under control in today’s high performance designs is a challenge, and power in particular is a major area of concern with the advent of lower technology nodes and increased packing densities.

New techniques are being developed to bring the best out of a design in terms of all these parameters. One such technique is the design of custom complex cells. In this paper, we will be discussing the architecture of one of the most commonly used complex cells – multi-bit flip-flops – and its merits and drawbacks. Later in this paper we will discuss the results of implementation using multi-bit flops in a particular design and what you need to be concerned about.

Custom cells – a new dimension in SoC design
Traditionally, there have been two approaches to VLSI design: analog/full custom design and digital SoC design. While custom design aims to extract the best of each element and involves manual efforts for schematic and layout design, digital SoC design focuses on automation.

In this approach, there are pre-designed basic structures like AND/OR gates that have been designed using full custom flow. Their models are characterized and used in automatic flows to create monster chips. These SoCs compromise some proportion of area, power, and timing to schedule. In other words, the timing, area, and power combination are only close to optimal.

There are some combinations of primitive structures that are used repeatedly at many places in a single SoC. In such cases, it may be useful to custom design these structures as a step towards optimizing power, area, and timing to the fullest. One of the custom-designed complex cells is a multi-bit flop. Using multi-bit flops enables optimization of power, area, and timing.

The total power consumption in an SOC has three elements; dynamic power, leakage power, and short-circuit power. Dynamic power is the major power source in all three of them and clock network power is the dominating source of dynamic power due to high switching of the clock signal. In a nutshell, a small impact on clock network power can reduce the total power significantly.

Multi-bit flip-flop structure
In a basic circuit structure, the clock signal is fed into all the flip-flops of an SoC. So improving the clock network (say, reducing the total number of clock buffers used) will improve the overall QoR of the design. To improve the flip-flop clock network, circuit designers have designed the multi-bit flip-flops.

A multi-bit flip-flop is either a 2-input or 4-input flip-flop (Figure 1 ) with same number of outputs. A multi-bit flip-flop consists of more than one flip-flop custom designed to optimize area and power.

Figure 1. A 4-bit flop

The major structural difference is the shared clock network between all the single-bit flip-flops of a multi-bit flip-flop. Due to this kind of implementation, all the single-bit elements are physically placed nearby, which resolves many physical design implementation challenges.

Figure 1 shows the block diagram of a 4-bit flop and Figure 2 shows the internal structure of a 4-bit flop. The in-built 4 flops share a common clock and scan enable. Also, the 4 flip-flops form an internal scan chain of 4 flops and can be plugged as it is forming a bigger scan chain. All these connections are made by hand and are near optimal in terms of use of resources.

Figure 2: Internal structure of multi-bit flip-flop

Advantages of using multi-bit flop
As stated above, multi-bitflip-flops are a step closer to optimal use of resources and offer manyadvantages over single-bit flip-flops:

  • The SoC implementation using multi-bit cell results in lesser number of clock sinks as seen by the clock-tree synthesis tool. Hence, their usage should result in less power consumption by the clock in all the flip-flops as the overall capacitance driven by a clock net gets reduced.
  • This should also reduce clock skew in sequential gates as the clock paths are balanced internally in a whole multi-bit cell.
  • The SoC implementation using multi-bit flip-flops should result in smaller SoC area as the total number of clock buffers should reduce, resulting in lesser congestion.
  • The multi-bit usage should improve the timing numbers, due to shared logic (in clock gating or set-reset logic) and an optimized multi-bit circuit and layout from library team.

Comparison of results
Weexperimented with a small block and ran two experiments, one allowingmulti-bit flops and other without using multi-bit flops, at differentfrequencies in order to compare the pros and cons of using these in ourdesign.

Table 1 shows flop count in the two runs. As isevident from the table, the number of flops is almost half if we allowthe usage of multi-bit flops. This approach is not size limited andscales well. When larger numbers of multi-bit flops are used theresulting optimization is even more pronounced.

Table 1: Number of flops/clock sinks with and without multi-bit flops usage

Table 2 shows the timing and power statistics after clock tree synthesis forthree different frequencies: 80 MHz, 120 MHz, and 160 MHz. The first rowin every category shows the statistics when multi-bit flops wereallowed. The lower row shows the statistics of the run without allowingmulti-bit flip-flops to be used. We analyzed the design both in terms oftiming and power.

The timing was analyzed in terms of WorstNegative Slack (WNS), Total Negative Slack (TNS) and total number ofviolation paths. Similarly, power was analyzed in terms of internal,switching, leakage, and clock power. The table shows the total number offlip-flops; it also shows total number of multi-bit flip-flops in eachcase.

Table 2: Timing and power statistics after clock tree synthesis

Conclusion
Theabove results show that while multi-bit flip-flops have advantages interms of power, performance, and clock tree implementation, they showmultifarious behavior with different frequencies.

The benefits ofmulti-bit flops are design and technology dependent, i.e., thefrequency at which the design is operating and the channel length. As wesee from the results, when design frequency is low (80 MHz and 120MHz), multi-bit flops show advantages over conventional flops. But whenthe frequency gets increased (160 MHz) their advantage is lessened.  Theexperiments were done on C55 technology, but on other technology nodesthe results may be different.

Gaurav Gupta ()is working as Lead Design Engineer at Freescale, with over nine yearsof industry experience. He is currently working on the physical designteam. He has experience in logical and Physical Synthesis, STA, StaticLow Power Verification, Formal Verification and has also worked inStandard Cells library characterization and validation domain.

Nalin Gupta ()is a Senior Design Engineer at Freescale Semiconductors, Noida, India.He has over three years experience in logical synthesis and place androute (PnR).

Gourav Kapoor ()is Senior Design Engineer at Freescale, and is currently working withthe physical design team, with static timing analysis as his area ofspecialization.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.