Designing embedded SoCs using older resistive technologies - Embedded.com

Designing embedded SoCs using older resistive technologies

When designing an SoC with a generic 32-bit MCU based on 0.18um (180 nm) processes with flash and a rich suite of analog and digital IPs, the authors found that the pre-route engines from current EDA tool vendors are tuned for smaller transistor node sizes and are not very good at the larger 180 nm geometries. Here are the steps they took to overcome such problems.

With the emergence of newer and faster technologies, we have seen a rapid increase in the number of complex designs that push CMOS transistor geometries to 90 nm and smaller dimensions. But designs based on larger dimensions are not disappearing. In fact, process technology nodes with 180nm and 250nm geometries are still considered “hot” .

If you consider yourself well equipped with the latest 90 nm EDA tool in your kitty, assuming it will be just as efficient at relatively conservative technology nodes such as 180nm, you might be in for a surprise. More so if the so-called “small design” requires seamless backward package-pin compatibility, has high frequency requirements, and targets fierce gross-margin numbers.

While the latest EDA tools offer features useful at any process node – such as signal integrity, design for manufacturability (DFM), and lithographic enhancements – they also include capabilities that take advantage of technology enhancements available only at the smaller node geometries, such as high-K oxide, copper metals, shrinking metal and poly pitch, and a higher number of metal layers, etc.

At smaller geometries, copper metal is used for its lower resistivity to ensure lower voltage signals are delivered reliably. So if your EDA tools have features designed to help you fabricate devices using copper, they might struggle if you pick a more conservative 180 nm node where aluminum is still used to lay metal routes, and so they are unable to help you with the issues relating to higher resistivity as it relates to the vias and contacts of the wire topology.

With EDA tools a designer has to depend on the accuracy and consistency of timing results from stage to stage (from placement to clock tree synthesis and then to routing). The consistency of results as it relates to net delays produced by the tools is heavily dependent on the accuracy of parasitic extractions performed.

Before designs are finally “detail routed”, all tools make approximations to predict net lengths, vias, and contacts, and hence the parasitic (RC) extractions. So the consistency of your timing results will vary with the parasitic extraction numbers. However, in 180 nm nodes, we have seen tools that have great difficulty in predicting the pre-route timing, leading to surprising results in the post-route stages.

The congestion in such designs introduces further inaccuracies in timing because during the trial routing stage commercial EDA tools are not very good at estimating the exact number of metal detours and hence the metal segments and vias/contacts a particular path might take once the design is finally routed. The mismatch in the number of vias/contacts from pre-route to post-route stage is in fact the cause of timing miscorrelation.

In this article we will outline some of the things we have done to either work around the limitations of existing EDA tools or use alternative design methodologies that do not depend on them.

Starting point: a standard 32 bit MCU. The design we worked on was a generic 32-bit MCU based on 0.18um (180 nm) process with flash and a rich suite of analog and digital IPs. Flash is often a bottleneck in uniform standard cell placement because of its huge size, which requires the use of a lot of net hops around it and hence the congestion at the corners and notches.

The MCU constituted a major portion of the total chip area. When added to the area required for analog components and memories, this left only about 55% of the chip area available for normal placement and routing. The CMOS technology used was based on a five-metal process with all aluminum layers.

Resolving the Routing Resource Crunch
As is the normal practice in the SoC design flow, the top layers in our design were used for power routing. But since the 180 nm technology node is more resistive to current flow and results in greater heat dissipation, the EDA tool’s power planner had to be programmed to make the grid more dense, increasing the number of metal layers for a given area to compensate. But with more of the available chip area taken up with a denser power grid, it left a smaller area for signal routing.

This left only the M3 layer (third metal layer) as a horizontal routing resource for signals. Due to the aforementioned gross margin target, the resulting design had a very high utilization goal; that is, it would require us to come up with ways to use the area that remained more efficiently. There were limitations on the placement of flash and other hard blocks due to backward pin compatibility requirements; that is, access to particular pin-outs limited where we could place and route our resources because of conflicts with those blocks.

As would be the case with any highly resistive technology, many of the custom routes we came up with had to use more than three layers for routing to compensate for stringent resistance requirements.

In addition to the many routing challenges, the combined effect of all these unfavorable factors would led to more and more timing miscorrelation between the pre-route to post-route stages.

Pre and Post-Route Timing surprises
At the Post-CTS Trial Route stage we first started noticing the congestion issue after the setup fixing stage, where we observe the following numbers:

Overflow: 27078 = 16483 (2.11% H) + 10595 (1.58% V)

However the timing scenario was quite manageable with a total negative slack (TNS) of ~5ns and a worst negative slack (WNS) of ~450ps.

We were able to safely ignore the above congestion number because of our experience with earlier designs done in 90nm and smaller. In smaller and in less resistive copper-based geometries, timing problems can be avoided because the router can be programmed to hop the metal intelligently by going through vias to either upper or lower layers. As a result, routing and timing is only marginally deteriorated. We knew we would have to avoid such situations at 180nm since the slightest of hops would result in lot of vias. Because the vias are laid down in less conductive aluminum, they are highly resistive and can prove catastrophic to timing.

But when we finally moved to detail-route the design, the timing scenario changed drastically. In spite of having met timing in the post-CTS stage, the post route timing showed a Total Negative Slack of 4000ns while Worst Negative Slack jumped to ~5ns. That is not what we usually see with the 90nm and smaller geometry designs.

What went wrong
Clearly something went drastically wrong in the routing. Usually such bad timing would happen because of the long detours in the routing networks we had created. So our next step was to run routing with the avoid_detour option for all nets in the chip layout. But that only helped us marginally. On further analysis of our design we realized that the trialRoute engine in our EDA tool suite is not very sophisticated in assigning the available routing resources to the nets. Rather than come up with a more intelligent way of dealing with potential routing congestion, it had instead estimated a straight connection. This left it to the tool’s detail-router engine to deal with the congestion. So to compensate and approximate a workable solution, it jogged and hopped to complete the routing, introducing many more vias – and higher resistance – in the process. This dysfunction led to mis-correlation in the timing . Figure 1 shows a trial-routed net:


Click on image to enlarge.

Figure 1: A trial-routed net

When detail-routed, the same net looked like this (Figure 2 ):


Click on image to enlarge.

Figure 2: The same net detail-routed

As is evident from the picture, the slew values on net degraded drastically, resulting in higher cell delays. On recording the RC values on trialroute and detailroute engines, the following results were reported:

Trial Route Estimation
Number of capacitance : 32
Net capacitance : 0.666722 pF
Number of resistance : 35
Total resistance : 1128.740422 Ohm

Actual Routing Results
Number of capacitance : 929
Net capacitance : 0.817383 pF
Number of resistance : 928
Total resistance : 8167.449317 Ohm

You should pay particular notice to the capacitance numbers above, which deteriorated only by 23% due to a net increase in metal layer length in the design of ~25%. While that was within reasonable and expected outcomes, what surprised us was that the resistance increased a whopping 8 times. This is a totally different outcome than we would have expected if we were using design rules for the smaller 90 nm node, since the copper metal interconnects there are much less resistive and no amount of hopping of the wires by the detail-routing engine would lead to results as bad as this.



Coming up with a solution
To find a better method for routing,we performed a series of experiments where we tried to improve upon thetiming. The first thing we thought of – the usual Post-RouteOptimization and buffer addition techniques – didn’t help much sinceRouter by itself was not able to improve the net topology much. As aresult the RC parasitic for most of the critical nets remained the sameand maintained the timing status-quo. Other things we tried, like bufferaddition in the critical path, only worsened the timing scenario sincethey added to the parasitic of already overloaded nets.

Finallywe zeroed-in on a multi-step approach, presented below, which helped usimprove the timing and enabled the timing closure:

  1. Taking post-routed RC parasitic back to postCTS optimization: As noted earlier, because the consistency of net delays from the tools is heavily dependent on the accuracy of parasitic extractions performed during clock tree synthesis, we first went back to the post CTS stage (with all routing removed) and made sure that it uses the actual measured RC values from the post-routed stage. Optimizing the CTS results in this way allowed us to improve the timing results by about 20%.
  2. Next we found certain places in clock logic where we could make improvements in the slew values that also had an impact on a large number of paths. But we had already used the best drive strength cells out of the allowed library cell list for clock tree synthesis. So we had to pick clock cells from dont_use cell list. These high-drive strength cells are normally masked out and isolated during the fabrication process because they have EM (Electro-migration) issues that ruled out their use in the active portions of the chip. This is a problem inherent to the aluminum interconnect used at the 180 nm node. But we carefully selected a few of these rejected higher drive cells that we thought could help us in our timing and performed extensive SPICE () simulations on them over a load/slew range that made us confident that they did not violate EM limits. We only had to do this at a handful of locations on the chip and it helped reduce TNS/WNS by a good margin.
  3. Timing & Congestion aware multi-step routing:

Routing Critical Nets First: We developed an in-house script to identify nets with bad slews andhigh net lengths amongst the top violating paths after routing. Thescript also calculated number of hops/segments in the net that were partof the net to assist in making judgments on the worst hit nets. Theoutput of the script is shown in Table 1 :


Click on image to enlarge.
Table 1

Thesenets were then routed before signals clocked with higher weight. Also,to avoid hopping, signal integrity (SI) aware routing was avoided. Thenets thus routed were set as fixed-nets so that eventual routing runswould not mess up their topology.

Routing Clock Nets: Clock nets with weights less than the above critical nets were thenrouted with no detouring. However, the clock nets were not set as fixedand so were modifiable as conditions dictate.

Routing other nets: The rest of the nets were routed normally without setting special weight and detouring parameters.

The resulting routing topology after fixing is shown in Figure 3 :


Click on image to enlarge.

Figure 3: Corrected routing topology

Conclusion:
Becausethe pre-route engines used in current EDA tools are tuned for thesmaller geometry technology nodes (90nm and below), they are not verygood at assessing the metal trace hopping or detours that are necessaryat higher 180 nm nodes. So, it should not come as a surprise thatdesigns done in more conservative 180 nm resistive technologies mightthrow up roadblocks when it comes to closing the timing at the PostRoutestage. So when implementing circuits at the more conservative 180 nmnode, designers need to be extra careful when the EDA tool’s trial routeengine reports something you might have safely ignored at the smallergeometry 90 nm technology nodes. In such cases it would be advisable tofollow the proactive approaches described in this article if you want toavoid last minute surprises.

Abbreviations used in the article
EDA: Electronic Design Automation
DFM: Design for Manufacturability
MCU: Micro Controller Unit
CTS: Clock Tree Synthesis
SoC: System on Chip
TNS: Total Negative Slack
WNS: Worst Negative Slack

Mayank Verma is a Sr. Design Engineer at Freescale Semiconductor, Noida, India. Hehas 5 years of rich industry experience in the fields of SOC Placement,CTS, Routing, Noise, Timing, DRC and DFM Closure. He has worked oncomplex low power complex automotive and Industrial SOC at multipletechnology nodes.

Vijay Bhargava is Lead Engineer atFreescale Semiconductor Noida. In his career spanning 10 years, he hasworked extensibly on Soc and Block-level Verification, Owned DigitalIPs, Soc-Integration, Power Estimation/Modeling and DSP Architectures inFront-End. As for backend, he has handled Synthesis and APR activities.Currently he is leading timing closure activities for SoCs.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.