In the dark over networks -

In the dark over networks

Once again a large piece of North America—this time the US southwest and part of northern Mexico—has suffered a power outage due to network instability. According to the Associated Press, an operator near Yuma, Arizona took a capacitor off-line, following correct procedures. We may infer from the absence of reports of vaporized technicians, flying fragments of switches, or columns of flame that the procedures were in fact followed—we are talking about power systems here. In fact, nothing at all appeared to happen for several minutes. Then, a section of regional high-capacity transmission line failed. The resulting transient, in turn, rippled through the entire Southwest’s power grid, knocking essentially everything off line and leaving several major cities and an estimated six million people without power.Now would be the point at which to begin a tirade on our shameful underinvestment in energy infrastructure. And in truth, the fact that the grid must work very close to capacity on hot afternoons probably contributed to its vulnerability last Thursday. But I’m interested in a different point today.The grid was supposed to be stable under these sorts of transients. After massive outages in 2003 and 2005, new standards mandated layers of redundancy and isolation to prevent a recurrence. But it appears that no one mandated, or actually constructed, an accurate dynamic model of the grid to verify that the safeguards guarded anything. Without a good model, the circuit breakers, bypasses, and loads would be impotent in the face of a large transient. And so it proved.That brings us around, by only a painful stretch of the topic, to SoC design. With tens of thousands or millions of instances, a mélange of different circuit types, and often an unfortunately rich variety of power-management techniques, an SoC approaches the complexity of at least a metro power grid, and maybe a regional one. We can reasonably expect the same need for accurate dynamic modeling of the data, clock, and power networks on an SoC as on a utility company’s distribution network.The analogy to the SoC’s power grid is perhaps most obvious. From passive wiring networks a few years ago, SoC power distribution has evolved into meshes of high-current, mixed-signal active circuits full of switches. As we add point-of-use regulators and similar tricks, some of the loads on that network may have a significant reactive component. That would be a prescription for dynamic instability. Clock networks, with their myriad of gates and growing wiring inductance, may be a similar challenge.There is a weaker analogy to the logic interconnect of the SoC as well. As we get more processing sites, more caches and local RAMs, and less deterministic delays in software, the data networks on our SoCs are becoming just as complex, and every bit as much in need of analysis, as power grids. But in this case we would be looking at locations of data and at latencies, rather than at voltages and currents.So setting aside messages for the electric power industry, there may be a few words from last week’s blackout for SoC designers. We may be as deficient in our ability to model the stability of our networks as the power industry is for theirs, and we may be approaching a problem of similar complexity. Network instability problems on an SoC might not be as public or as dramatic as the lights going out across the Southwest, but the impact on the reliability of a chip would be no less real.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.