In large and complex system-on-chip ASIC design, two of the most challenging tasks are those involving design closure, timing routing and power.
It is a tedious task to converge on timing and routing, owing to the limitations of design size and the memory-intensive calculations. Essentially, it is dependent on the design size that an EDA tool can handle.
In such cases, it is advisable to go for a hierarchical approach instead of a flat top. Generally, the blocks are demarcated on the basis of functionality, backward compatibility, third party IP etc.
This article details the difference in terms of runtimes, routing congestion, timing summary and utilization for a design that is done as hierarchical vs. the same design using the flat approach.
There are various reasons for a design team to consider moving to a hierarchy-based closure:
• Memory requirement —A huge design would involve lots of computation and memory space. So the design is divided into a number of blocks with each block closure using memory optimally.
• Runtimes —These can be reduced if different blocks are closed in parallel.
• Multi-site project —If different blocks are closed across different global locations, then hierarchical flow is the way to go.
• Schedule requirement —If there are certain blocks that are not mature and would be delivered a month before tape-out, then it is advisable to close the design hierarchically and plug the block later on
• Already frozen/third-party IPs —If a design contains already frozen IP or a third party IP, then hierarchical flow might be required.
• Design requirement — Sometimes, there are certain strategic design requirements that make you follow the hierarchical approach.
A case study
We are going to discuss an example of a possithe latest design. This article discusses this in two parts. In the first approach, we discuss the closure in hierarchical flow and later on, the flat-based approach. Finally, we shall compare the two approaches and conclude accordingly.
Hierarchy-based approach. The design under discussion had two blocks. So effectively we had two blocks and a chip_top to pay attention to. Figure 1 below is the floorplan in such a case. As you can see, there are two huge partitions in the design. It can also be seen that there is huge channel in between the blocks.
Figure 1: Partitioning your SoC into two parts.
Also, there are certain design requirements like signals from either of the partitions should not cross over the other partition Table 1 below shows the timing summary when the design was done using a hierarchical flow.
Table 1: Timing summary when the design was done using a hierarchical flow.
We tried doing the same design by semi-flat approach, i.e. we tried to dissolve one of the partitions into chip_top. We noticed that there was marked difference in terms of the timing and congestion numbers. Also, the runtimes were reduced by about 2x. Figure 2 below is the snapshot of the Semi-flat Floorplan and the timing numbers.
Figure 2: Shown is the snapshot of the Semi-flat Floorplan and the timing numbers.
Comparing both approaches, there is sufficient gain in the quality of results (QoR) of the design.
In Figure 3 below , the “deep red” portion of the design is partition I . If you notice that since the semi-flat approach is being followed, the partition I does not have a hard core defined boundary and its logical cells can be placed along with the chip_top.
Figure 3: The “deep red” portion of the design is partition I.
The placement of the cells is decided on the basis of timing and routing tracks available. Since there is much wider scope for placement of partition I cells, we see better QoR as far as timing and other parameters are concerned.
Hierarchical designs face more challenges. Let’s go through some of them:
Timing constraints budgeting— This is probably the most widely documented and discussed topic for static timing analysis. Design teams across the globe have different methodologies to implement this.
Most teams work on 80-20 strategy, which basically assumes that 20 percent of the clock period margin is provided to the hierarchical block while 80 percent is kept for other blocks and chip_top, as shown in Figure 4 below .
Figure 4: Most teams work on 80-20 strategy, which basically assumes that 20 percent of the clock period margin is provided to the hierarchical block while 80 percent is kept for other blocks and chip_top.
There are several other ways to derive timing budgets:
• EDA tool based methodology,
• Budgets on the basis of the number of logic level in timing path, etc.
Pin Placement— As shown in Figure 5 below, in hierarchical flow the physical constraints can force the design to be really constrained.
Figure 5: Both partitions were forced to have the I/O pins either in between the formed channel (facing each other) or at the top (so that they can interact with the chip_top).
Here, both partitions were forced to have the I/O pins either in between the formed channel (facing each other) or at the top (so that they can interact with the chip_top).
The problem with such an approach is also shown in Figure 5. Because of the huge I/O pin density, the concerned chip_top logic shall also be placed close to the pins, which results in more than 90 percent placement density, hence placement and routing congestion. Such scenarios are better handled in the semi-flat or flat approach wherein the pin placement is not hard-bound.
Clock-tree— Clock tree synthesis shall also become compromised, owing to the fact that the entire partition I would have uncommon clock tree as compared to partition II. This would affect the timing slack in on-chip.
Block-shape— As shown in Figure 5, the block shape of partition II is strangely rectilinear. This arises after numerous experiments at the floor-planning stage. Now if the flow was semi- flat, then such floor-planning experiments are totally uncalled for and would save valuable man-hours.
On a case to case basis, it is advantageous to go with the flat-based approach. It gives advantages in terms of the following:
• Better QoR (Timing budgets, Clock Tree);
• A number of design resources are saved (In the present case, a dedicated engineer was required to handle partition I.);
• STA constraints require more tedious effort, since it would require lots more budgeting;
• Constraints related to verification become a huge effort;
• Floor-planning becomes much easier in terms of Pin Placement and the block shape.
The disadvantages of the flat or semi-flat based approach are the following:
• More runtimes (But this, if offset by the fact that the convergence of design, is much quicker in flat-based approach);
• More memory requirement;
• Limitation of EDA tools to handle more than a specific gate count as it becomes computationally very intensive.
So, the physical design team has to think in a more prudent manner and only then try to identify if there is actually a need of hierarchical based flow. Otherwise, flat or semi-flat based flow has an edge over the hierarchy- based flow.
Sunit Bansal is a Senior Design Engineer at Freescale Semiconductor Inc.