The reason
for the growing interest in networks on chips (NoCs) can
be explained by looking at the evolution of integrated circuit
technology and at the ever-increasing
requirements of electronic systems. The integrated microprocessor has
been a landmark in the evolution of computing technology.
Whereas it took monstrous efforts to be
completed, it appears now as a simple object to us. Indeed, the
microprocessor involved the connection of a computational engine to a
layered memory system, and this was achieved using busses. In the last
decade, the frontiers of integrated circuit design opened widely. On
one side, complex application specific integrated circuits (ASICs)were
designed to
address-speciýc applications, for example
mobile telephony.
These systems require functional units, thus requiring efficient
on-chip
communication. On another side, multiprocessing platforms were
developed to address high-performance computation, such as image
rendering. Examples are Sony's emotion engine [25] and IBM's cell chip
[26], where on-chip communication efficiency is key to the overall
system performance.
At the same time, the shrinking of
processing technology in the deep submicron (DSM) domain exacerbated
the imbalance between gate delays and wire delays on chip. Accurate
physical design became the bottleneck for design closure, a word in
jargon to indicate the ability to conclude successfully a tape out.
Thus, the on-chip interconnection is now the dominant factor in
determining performance. Architecting the interconnect level at a
higher abstraction level is a key factor for system design.
We have to understand the introduction
of NoCs in systems-on-chip (SoCs) design as
a gradual process, namely
as an evolution of bus interconnect technology. For example, there is
not a strict distinction between multi-layer busses and crossbar NoCs.
We have also to credit C. Seitz and W. Dally [9] for stressing the need
of
network interconnect for high-performance multiprocessing, and for
realizing the ýrst prototypes of networked integrated multiprocessors.
But overall, NoC has become a broad
topic of research and development in the new millennium, when designers
were confronted with technological limitations, rising hardware design
costs and increasingly higher system complexity.
 |
| Figure
1.1. Trafýc pattern in a large-scale system. Limited parallelism is
often a cause of congestion. |
Why on-chip networking?
Systems on silicon have a complexity comparable to skyscrapers or
aircraft carriers, when measured in terms of number of basic elements.
Differently from other complex systems, they can be cloned in a
straightforward way but they have to be designed in correctly, as
repairs are nearly impossible. SoCs require design methodologies that
have commonalities with other types of large-scale system design (Figure 1.1 above). In particular,
when looking at on-chip interconnect design methods, it is useful to
compare the on-chip interconnect to the worldwide interconnect provided
by the Internet.
The latter is capable of taming the
system complexity and of providing reliable service in presence of
local malfunctions. Thus, networking technology has been able to
provide us with quality of service (QoS),
despite the heterogeneity and
variability of the Internet nodes and links. It is then obvious that
networking technology can be instrumental for the bettering of
very-large-scale integration (VLSI) circuit/system design technology.
On the other hand, the challenges in
marrying network and VLSI technologies are in leveraging the essential
features of networking that are crucial to obtaining fast and reliable
on-chip communication. Some novices think that on-chip networking
equates to porting the Transmission Control Protocol/Internet Protocol
(TCP/IP) to silicon or achieving
an on-chip Internet.
This is not feasible, due to the high
latency related to the complexity of TCP/IP. On-chip communication must
be fast, and thus networking techniques must be simple and effective.
Bandwidth, latency and energy consumption for communication must be
traded off in the search for the best solution.
On the bright side, VLSI chips have wide
availability of wires on many layers, which can be used to carry data
and control information. Wide data busses realize the parallel
transport of information. Moreover, data and control do not need to be
transported by the same means, as in networked computers (Figure 1.2, below). Local proximity
of computational and storage unit on chip makes transport extremely
fast. Overall, the wire-oriented nature of VLSI chips makes on-chip
networking both an opportunity and a challenge.
 |
| Figure
1.2. Distributed systems communicate via a limited number of cables
(a). Conversely, VLSI chips use up to 10 levels of wires for
communicating (b). |
In summary, the main
motivation for
using on-chip networking is to achieve performance using a system
perspective of communication. This reason is corroborated by the fact
that simple on-chip communication solutions do not scale up when the
number of processing and storage arrays on chip increases. For example,
on-chip busses can serve a limited number of units, and beyond that,
performance degrades due to the bus parasitic capacitance and the
complexity of arbitration.
Technology trends
In the current projections [37] of future silicon technologies, the
operating frequency and transistor density will continue to grow,
making energy dissipation and heat extraction a major concern.
At
the
same time, on-chip supply voltages will continue to decrease, with
adverse impact on signal integrity. The voltage reduction, even though
beneýcial, will not sufýce to mitigate the energy consumption problem,
where a major contribution is due to leakage. Thus, SoCs will
incorporate dynamic power management (DPM) techniques in various forms
to satisfy energy consumption bounds [4].
Global wires, connecting different
functional units, are likely to have propagation delays largely
exceeding the clock period [18]. Whereas signal pipelining on
interconnections will become common practice, correct design will
require knowing the signal delay with reasonable accuracy. Indeed, a
negative side effect of technology downsizing will be the spreading of
physical parameters (e.g., variance of wire delay per unit length) and
its relative importance as compared to the timing reference signals
(e.g., clock period).
The spreading of physical parameters
will make it harder to achieve high-performing chips that safely meet
all timing constraints. Worst-case timing methodologies, that require
clocking period larger than the worst-case propagation delay, may
underuse the potentials of the technology, especially when the
worst-case propagation delays are rare events.
Moreover, it is likely
that varying on-chip temperature profiles (due to varying loads and
DPM) will increase the spread of wiring delays [2]. Thus, it will be
mandatory to go beyond worst-case design methodology, and use
fault-tolerant schemes that can recover from timing errors [11, 30,
35].
Most large SoCs are designed using
different voltage islands [23], which are regions with
speciýc voltage and operation frequencies, which in turn may depend on
the workload and dynamic voltage and frequency scaling. Synchronization
among these islands may become extremely hard to achieve, due to timing
skews and spreads. Global wires will span multiple clock
domains, and
synchronization failures in communicating between different clock
domains will be rare but unavoidable events [12].
Signal Integrity
With forthcoming technologies, it will be harder to guarantee
error-free information transfer (at the electrical level) on wires
because of several reasons:
- Reduced signal swings with a corresponding reduction
of voltage noise margins.
- Crosstalk is bound to increase, and the complexity of
avoiding crosstalk by identifying all potential on-chip noise sources
will make it unlikely to succeed fully.
- Electromagnetic interference (EMI) by external sources
will become more of a threat because of the smaller voltage swings and
smaller dynamic storage capacitances.
- The probability of occasional synchronization failures
and/or metastability will rise. These erroneous conditions are possible
during system operation because of transmission speed changes, local
clock frequency changes, timing noise (jitter), etc.
- Soft errors due to collision of thermal neutrons
(produced by the decay of cosmic ray showers) and/or alpha particles
(emitted by impurities in the package). Soft errors can create spurious
pulses, which can affect signals on chip and/or discharge dynamic
storage capacitances.
Moreover, SoCs may be willfully operated
in error-prone operating conditions because of the need of extending
battery lifetime by lowering energy consumption via supply voltage
over-reduction. Thus, speciýc run-time policies may trade-off signal
integrity for energy consumption reduction, thus exacerbating the
problems due to the fabrication technology.
Reliability
System-level reliability is the probability that the system will
operate correctly at time, t,
as a function of time. The expected value of the reliability function
is the mean time to failure (MTTF).
Increasing MTTF well beyond the
expected useful life of a product is an important design criterion.
Highly reliable systems have been object of study for many years.
Beyond traditional applications, such as aircraft control, defense
applications and reliable computing, there are many new ýelds requiring
high-reliable SoCs, ranging from medical applications to automotive
control and more generally to embedded systems that are critical for
human operation and life.
 |
| Figure
1.3. Failure on a wire due to electromigration. |
The increased demand of high-reliable
SoCs is counterbalanced by the increased failure rates of devices and
interconnects. Due to technology downscaling, failures in the
interconnect due to electromigration are more likely to happen (Figure 1.3, above). Similarly,
device failure due to dielectric breakdown is more likely because of
higher electric ýelds and carrier speed (Figure 1.4, below). Temperature
cycles on chip induce mechanical stress, that has counter-productive
effects [28].
For these reasons, SoCs need to be
designed with speciýc resilience toward hard (i.e., permanent) and
soft (i.e., transient) malfunctions. System-level solutions for hard
errors involve redundancy, and thus require the on-line connection of a
stand-by unit and disconnection of the faulty unit. Solutions for soft
errors include design techniques for error containment, error detection
and correction via encoding.
Moreover, when soft errors induce timing
errors, system based on double-latch clocking can be used for detection
and correction. NoCs can provide resilient solutions toward hard errors
(by supporting seamless connection/ disconnection of units) and soft
errors (by layered error correction).
 |
| Figure
1.4. Failure on a transistor due to oxide breakdown. |
Non-determinism in SoC Modeling
and Design
As SoC complexity scales, it will be more difýcult, if not impossible,
to capture their functionality with fully deterministic models of
operation. In other words, system models may have multiple
implementations. Property abstraction, which is key to managing
complexity in modeling and design, will hide implementation details and
designers will have to relinquish control of such details.
Whereas abstract modeling and automated
synthesis enables complex system design, such an approach increases the
variability of the physical and electrical parameters. In summary, to
ensure correct and safe realizations, the system architecture and
design style have to be resilient against errors generated by various
sources, including:
- process technology (parameter spreading, defect
density, failure rates);
- environment (temperature variation, EMI,
radiation);
- operation mode (very-low-voltage operation);
- design style (abstraction and synthesis from
non-deterministic models).
Variability, Design
Methodologies and NoCs
Dealing with variability is an important matter affecting many aspects
of SoC design. We consider here a few aspects related to on-chip
communication design.
The ýrst important issue deals with
malfunction containment. Traditionally, malfunctions have been avoided
by putting stringent rules on physical design and by applying stringent
tests on signal integrity before tape out. Rules are such that
variations of process parameters can be tolerated, and integrity
analysis can detect potential problems such as crosstalk. This approach
is conservative in nature, and leads to perfecting the physical layout
of circuits.
On the other hand, the downscaling of
technologies has unveiled many potential problems and as a result the
physical design tools have grown in complexity, cost and time to
achieve design closure. At some point, correct-by-construction design
at the physical level will no longer be possible. Similarly, the
increasingly larger amount of connections on chip will make signal
integrity analysis unlikely to detect all potential crosstalk errors.
Future trends will soften requirements
at the physical and electrical level, and require higher-level
mechanisms for error correction. Thus, electrical errors will be
considered inevitable. Nevertheless, their effect can be contained by
techniques that correct them at the logic and functional levels. In
other words, the error detection/correction paradigm applied to
networking will become a standard tool in on-chip communication design.
Timing errors are an important side
effect of variability. Timing errors can be originated by a wide
variety of causes, including but not limited to: incorrect wiring delay
estimate, overaggressive clocking, crosstalk and soft
(radiation-induced) errors. Timing errors can be detected by double
latches, gated by different clocking signals, and by comparing the
latched data. When the data differs, it means that most likely the
signal settled after the ýrst latch was gated, that is, that a timing
error was on the verge of being propagated. (Unfortunately, errors can
happen also in the latch themselves.)
 |
| Figure
1.5. The voltage swing on communication busses is reduced, even though
signal integrity is partially compromised [35]. Encoding techniques are
used to detect corrupted data which is retransmitted. The
retransmission rate is an input to a closed-loop dynamic voltage
scaling (DVS) control scheme, which sets the voltage swing at a
trade-off point between energy saving and latency penalty (due to data
retransmission). |
Asynchronous design methodologies can
make the circuit resilient to delay variations. For example,
speed-independent and delay-insensitive circuit families can operate
correctly in presence of delay variations in gates and interconnects.
Unfortunately, design complexity often make the application of an
integral asynchronous design methodology impractical. A viable
compromise is the use of globally asynchronous locally synchronous
(GALS) circuits that use asynchronous handshaking protocols
to
link various synchronous domains possibly clocked at various
frequencies.
 |
| Figure
1.6. Razor [11] is another realization of self-calibrating circuits,
where a processor's supply is lowered till errors occur. The correct
operation of the processor is preserved by an error detection and
pipeline adjustment technique. As a result, the processor settles
on-line to an operating voltage which minimizes the energy consumption
even in the presence of variation of technological parameters. |
NoCs are well poised to deal with
variability because networking technology is layered and error
detection, containment and correction can be done at various layers,
according to the nature of the possible malfunction. There are several
paradigms that deal with variability for NoCs. Self-calibrating
circuits are circuits that adapt on-line to the operating conditions.
There are several embodiments of self-calibrating circuits, as shown in
Figure 1.5 and Figure 1.6 above
and Figure 1.7 below.
 |
| Figure
1.7. T-error is a timing methodology for NoCs where data is pipelined
through double latches, where the former used an aggressive period and
the latter a safe one. For most patterns, T-error will forward data
from the ýrst latch. When the slowest patterns are transmitted that
fail the deadline at the ýrst latch, correct but slower operation is
performed by the second latch [30]. |
Next in Part 2: System on
chip objectives and network on chip needs
Used
with the
permission of the publisher, Newnes/Elsevier, this series of six
articles is based on material from "Networks
On Chips: Technology and Tools," by Luca Benini and Giovanni De
Micheli.
Luca
Benini is
professor at the Department of Electrical Engineering and Computer
Science at the University of Bologna, Italy. Giovanni De Micheli is
professor and director of the Integrated Systems Center at EPF in
Lausanne, Switzerland.
References
[1] A. Adriahantenaina, H. Charlery, A. Greiner, L.
Mortiezand and C. Zeferino, "SPIN:
A Scalable, Packet Switched, On-Chip
Micro-network,''DATE - Design, Automation
and
Test in Europe Conference and
Exhibition, 2003, pp. 70 -73 .
[2]A.H. Ajami, K. Banerjee and M. Pedram, "Modeling and
Analysis of Nonuniform Substrate Temperature Effects on Global ULSI
Interconnects,'' IEEE Transactions on
CAD,
Vol. 24, No. 6, June 2005, pp. 849 - 861.
[3] H. Bakoglu, Circuits, Interconnections, and Packaging
for VLSI, Addison-Wesley, Upper Saddle River, NJ, 1990.
[4] L. Benini, A. Bogliolo and G. De Micheli, "A
Survey of
Design Techniques for System-Level Dynamic Power Management,'' IEEE
Transactions on Very Large-Scale
Integration Systems, Vol. 8, No. 3, June 2000, pp.
299 - 316.
[5]W.O. Cesario, D. Lyonnard, G. Nicolescu, Y. Paviot, S.
Yoo, L. Gauthier,
M. Diaz-Nava and A.A. Jerraya, "Multiprocessor
SoC Platforms: A
Component-Based Design Approach,'' IEEE Design and
Test of Computers, Vol. 19, No. 6,
November"December 2002, pp. 52 - 63.
[6]W. Dally and B. Towles,
Principles and Practices of
Interconnection Networks, Morgan Kaufmann, San Francisco, CA, 2004.
[7]W. Dally and B. Towles, "Route
Packets, Not Wires:
On-Chip Interconnection Networks,'' Proceedings of
the
38th Design Automation Conference.
2001.
[8]W.J. Dally and H. Aoki, "Deadlock-Free
Adaptive
Routing in Multicomputer Networks Using Virtual Channels,'' IEEE
Transactions on Parallel and Distributed
Systems, Vol. 4, No. 4, April 1993, pp. 466 - 475.
[9]W. Dally and C. Seitz, "The Torus
Routing Chip,'' Distributed
Processing, Vol. 1, 1996, pp. 187 - 196.
[10]M. Dall'Osso, G. Biccari, L. Giovannini, D.
Bertozzi and L. Benini, "Xpipes: A Latency
Insensitive Parameterized
Network-on-Chip Architecture for Multiprocessor SoCs,''
International
Conference on Computer Design, 2003, pp. 536"539.
[11]D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin,
T. Mudge, N. S. Kim and K. Flautner, "Razor:
Circuit-Level Correction
of Timing Errors for Low-Power Operation,'' IEEE Micro,
Vol. 24, No. 6, November-December 2004, pp. 10 - 20.
[12]W. Dally and J. Poulton, Digital
Systems
Engineering, Cambridge University Press, Cambridge, MA, 1998.
[13]J. Duato, S. Yalamanchili and L. Ni,
Interconnection
Networks: An Engineering Approach, Morgan Kaufmann, San
Francisco, CA, 2003.
[14]T. Dumitra, S. Kerner and R. Marculescu, "Towards
On-Chip Fault-Tolerant Communication,'' ASPDAC - Proceedings
of the Asian-South Paciýc Design
Automation Conference, 2003, pp. 225 - 232.
[15]S. Goel, K. Chiu, E. Marinissen, T. Nguyen and S.
Oostdijk, "Test
Infrastructure Design for the Nexperia Home Platform
PNX8550 System Chip,'' DATE - Proceedings of
the Design Automation and Test
Europe Conference, 2004.
[16]K. Goossens, J. van Meerbergen, A. Peeters and P.
Wielage, "Networks
on Silicon: Combining Best Efforts and Guaranteed
Services,'' Design Automation and Test
in Europe Conference, 2002, pp. 423 - 427.
[17]R. Hegde and N. Shanbhag, "Toward
Achieving
Energy Efýciency in Presence of Deep Submicron Noise,'' IEEE
Transactions
on VLSI Systems, Vol. 8, No. 4, August
2000, pp. 379 - 391.
[18]R. Ho, K. Mai and M. Horowitz, "The
Future of
Wires,'' Proceedings of the IEEE, January 2001.
[19]J. Hu and R. Marculescu, "Energy-Aware
Mapping
for Tile-Based NOC Architectures Under Performance Constraints,'' Asian-Pacific
Design Automation Conference, 2003.
[20]F. Karim, A. Nguyen and S. Dey, "On-Chip
Communication Architecture for OC-768 Network Processors,''
Proceedings
of the 38th Design Automation Conference, 2001.
[21]B. Khailany, et al., "Imagine:
Media Processing
with Streams,'' IEEE Micro, Vol. 21, No. 2, 2001, pp. 35"46.
[22]S. Kumar, et al., "A
Network on Chip Architecture
and Design Methodology,'' VLSI on Annual Symposium, IEEE Computer
Society ISVLSI 2002.
[23]D. Lackey, P. Zuchowski, T. Bednar, D. Stout, S.
Gould and J. Cohn, "Managing
Power and Performance for Systems on Chip
Design Using Voltage Islands,'' ICCAD - International
Conference on
Computer Aided Design, 2002, pp. 195 - 202.
[24]P. Lieverse, P. van der Wolf, K. Vissers and E.
Deprettere, "A
Methodology for Architecture Exploration of
Heterogeneous Signal Processing Systems,'' Journal of
VLSI Signal Processing for Signal,
Image and Video Technology, Vol.
29, No. 3, 2001, pp. 197 - 207.
[25]M. Oka and M. Suzuoki, "Designing
and Programming
the Emotion Engine,'' IEEE Micro, Vol. 19, No.
6,
November - December 1999, pp. 20 - 28.
[26]D. Pham, et al., "Overview
of the Architecture,
Circuit Design, and Physical Implementation of a First-Generation Cell
Processor,'' IEEE Journal of Solid-State
Circuits, Vol. 41, No. 1, January 2006, pp. 179 - 196.
[27]A. Pinto, L. Carloni and A.
Sangiovanni-Vincentelli, "Constraint-Driven
Communication Synthesis,''
Design Automation Conference, 2002, pp. 195 - 202.
[28]K. Skadron, et al., "Temperature-Aware
Computer
Systems: Opportunities and Challenges,'' IEEE Micro,
Vol. 23, No. 6, November"December 2003, pp. 52 - 61.
[29]D. Sylvester and K. Keutzer, "A
Global Wiring
Paradigm for Deep Submicron Design,'' IEEE Transactions
on CAD/ICAS, Vol. 19, No. 2, February 2000, pp.
242 - 252.
[30]R. Tamhankar, S. Murali and G. De Micheli,
"Performance
Driven Reliable Link for Networks on Chip,'' ASPDAC - Proceedings
of the Asian Paciýc Conference on Design Automation,
Shahghai, 2005, pp. 749 - 754.
[31]T. Theis, "The
Future of Interconnection
Technology,'' IBM Journal of Research
and Development, Vol. 44, No. 3, May 2000, pp.
379"390.
[32]E. Waingold, et al., "Baring
It All to Software:
Raw Machines,'' IEEE Computer, Vol. 30, No. 9,
September 1997, pp. 86 - 93.
[33]J. Walrand and P. Varaiya,
High-Performance
Communication Networks, Morgan Kaufmann, San
Francisco, CA, 2000.
[34]M. Wolfe, High Performance Compilers
for Parallel Computing, Addison-Wesley,
Upper Saddle River, NJ, 1995.
[35]F. Worm, P. Ienne, P. Thiran and G. De Micheli,
"An Adaptive
Low-Power Transmission Scheme for On-Chip Networks,'' ISSS,
Proceedings of the International
Symposium on System Synthesis,
Kyoto, October 2002, pp. 92 - 100.
[36] H. Zhang, V. George and J. Rabaey, "Low-Swing
On-Chip Signaling Techniques: Effectiveness and Robustness,'' IEEE
Transactions on VLSI Systems,
Vol. 8, No. 3, June 2000, pp. 264 - 272.
[37]http://public.itrs.net/