CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

The challenges of nextgen multicore networks-on-chip systems: Part 1
Why on-chip networking?



Embedded.com
The reason for the growing interest in networks on chips (NoCs) can be explained by looking at the evolution of integrated circuit technology and at the ever-increasing requirements of electronic systems. The integrated microprocessor has been a landmark in the evolution of computing technology.

Whereas it took monstrous efforts to be completed, it appears now as a simple object to us. Indeed, the microprocessor involved the connection of a computational engine to a layered memory system, and this was achieved using busses. In the last decade, the frontiers of integrated circuit design opened widely. On one side, complex application specific integrated circuits (ASICs)were designed to address-speciýc applications, for example mobile telephony.

These systems require functional units, thus requiring efficient on-chip communication. On another side, multiprocessing platforms were developed to address high-performance computation, such as image rendering. Examples are Sony's emotion engine [25] and IBM's cell chip [26], where on-chip communication efficiency is key to the overall system performance.

At the same time, the shrinking of processing technology in the deep submicron (DSM) domain exacerbated the imbalance between gate delays and wire delays on chip. Accurate physical design became the bottleneck for design closure, a word in jargon to indicate the ability to conclude successfully a tape out. Thus, the on-chip interconnection is now the dominant factor in determining performance. Architecting the interconnect level at a higher abstraction level is a key factor for system design.

We have to understand the introduction of NoCs in systems-on-chip (SoCs) design as a gradual process, namely as an evolution of bus interconnect technology. For example, there is not a strict distinction between multi-layer busses and crossbar NoCs. We have also to credit C. Seitz and W. Dally [9] for stressing the need of network interconnect for high-performance multiprocessing, and for realizing the ýrst prototypes of networked integrated multiprocessors.

But overall, NoC has become a broad topic of research and development in the new millennium, when designers were confronted with technological limitations, rising hardware design costs and increasingly higher system complexity.

Figure 1.1. Trafýc pattern in a large-scale system. Limited parallelism is often a cause of congestion.

Why on-chip networking?
Systems on silicon have a complexity comparable to skyscrapers or aircraft carriers, when measured in terms of number of basic elements. Differently from other complex systems, they can be cloned in a straightforward way but they have to be designed in correctly, as repairs are nearly impossible. SoCs require design methodologies that have commonalities with other types of large-scale system design (Figure 1.1 above). In particular, when looking at on-chip interconnect design methods, it is useful to compare the on-chip interconnect to the worldwide interconnect provided by the Internet.

The latter is capable of taming the system complexity and of providing reliable service in presence of local malfunctions. Thus, networking technology has been able to provide us with quality of service (QoS), despite the heterogeneity and variability of the Internet nodes and links. It is then obvious that networking technology can be instrumental for the bettering of very-large-scale integration (VLSI) circuit/system design technology.

On the other hand, the challenges in marrying network and VLSI technologies are in leveraging the essential features of networking that are crucial to obtaining fast and reliable on-chip communication. Some novices think that on-chip networking equates to porting the Transmission Control Protocol/Internet Protocol (TCP/IP) to silicon or achieving an on-chip Internet.

This is not feasible, due to the high latency related to the complexity of TCP/IP. On-chip communication must be fast, and thus networking techniques must be simple and effective. Bandwidth, latency and energy consumption for communication must be traded off in the search for the best solution.

On the bright side, VLSI chips have wide availability of wires on many layers, which can be used to carry data and control information. Wide data busses realize the parallel transport of information. Moreover, data and control do not need to be transported by the same means, as in networked computers (Figure 1.2, below). Local proximity of computational and storage unit on chip makes transport extremely fast. Overall, the wire-oriented nature of VLSI chips makes on-chip networking both an opportunity and a challenge.

Figure 1.2. Distributed systems communicate via a limited number of cables (a). Conversely, VLSI chips use up to 10 levels of wires for communicating (b).

In summary, the main motivation for using on-chip networking is to achieve performance using a system perspective of communication. This reason is corroborated by the fact that simple on-chip communication solutions do not scale up when the number of processing and storage arrays on chip increases. For example, on-chip busses can serve a limited number of units, and beyond that, performance degrades due to the bus parasitic capacitance and the complexity of arbitration.

Technology trends
In the current projections [37] of future silicon technologies, the operating frequency and transistor density will continue to grow, making energy dissipation and heat extraction a major concern.

At the same time, on-chip supply voltages will continue to decrease, with adverse impact on signal integrity. The voltage reduction, even though beneýcial, will not sufýce to mitigate the energy consumption problem, where a major contribution is due to leakage. Thus, SoCs will incorporate dynamic power management (DPM) techniques in various forms to satisfy energy consumption bounds [4].

Global wires, connecting different functional units, are likely to have propagation delays largely exceeding the clock period [18]. Whereas signal pipelining on interconnections will become common practice, correct design will require knowing the signal delay with reasonable accuracy. Indeed, a negative side effect of technology downsizing will be the spreading of physical parameters (e.g., variance of wire delay per unit length) and its relative importance as compared to the timing reference signals (e.g., clock period).

The spreading of physical parameters will make it harder to achieve high-performing chips that safely meet all timing constraints. Worst-case timing methodologies, that require clocking period larger than the worst-case propagation delay, may underuse the potentials of the technology, especially when the worst-case propagation delays are rare events.

Moreover, it is likely that varying on-chip temperature profiles (due to varying loads and DPM) will increase the spread of wiring delays [2]. Thus, it will be mandatory to go beyond worst-case design methodology, and use fault-tolerant schemes that can recover from timing errors [11, 30, 35].

Most large SoCs are designed using different voltage islands [23], which are regions with speciýc voltage and operation frequencies, which in turn may depend on the workload and dynamic voltage and frequency scaling. Synchronization among these islands may become extremely hard to achieve, due to timing skews and spreads. Global wires will span multiple clock domains, and synchronization failures in communicating between different clock domains will be rare but unavoidable events [12].

Signal Integrity
With forthcoming technologies, it will be harder to guarantee error-free information transfer (at the electrical level) on wires because of several reasons:

  • Reduced signal swings with a corresponding reduction of voltage noise margins.
  • Crosstalk is bound to increase, and the complexity of avoiding crosstalk by identifying all potential on-chip noise sources will make it unlikely to succeed fully.
  • Electromagnetic interference (EMI) by external sources will become more of a threat because of the smaller voltage swings and smaller dynamic storage capacitances.
  • The probability of occasional synchronization failures and/or metastability will rise. These erroneous conditions are possible during system operation because of transmission speed changes, local clock frequency changes, timing noise (jitter), etc.
  • Soft errors due to collision of thermal neutrons (produced by the decay of cosmic ray showers) and/or alpha particles (emitted by impurities in the package). Soft errors can create spurious pulses, which can affect signals on chip and/or discharge dynamic storage capacitances.

Moreover, SoCs may be willfully operated in error-prone operating conditions because of the need of extending battery lifetime by lowering energy consumption via supply voltage over-reduction. Thus, speciýc run-time policies may trade-off signal integrity for energy consumption reduction, thus exacerbating the problems due to the fabrication technology.

Reliability
System-level reliability is the probability that the system will operate correctly at time, t, as a function of time. The expected value of the reliability function is the mean time to failure (MTTF). Increasing MTTF well beyond the expected useful life of a product is an important design criterion. Highly reliable systems have been object of study for many years. Beyond traditional applications, such as aircraft control, defense applications and reliable computing, there are many new ýelds requiring high-reliable SoCs, ranging from medical applications to automotive control and more generally to embedded systems that are critical for human operation and life.

Figure 1.3. Failure on a wire due to electromigration.

The increased demand of high-reliable SoCs is counterbalanced by the increased failure rates of devices and interconnects. Due to technology downscaling, failures in the interconnect due to electromigration are more likely to happen (Figure 1.3, above). Similarly, device failure due to dielectric breakdown is more likely because of higher electric ýelds and carrier speed (Figure 1.4, below). Temperature cycles on chip induce mechanical stress, that has counter-productive effects [28].

For these reasons, SoCs need to be designed with speciýc resilience toward hard (i.e., permanent) and soft (i.e., transient) malfunctions. System-level solutions for hard errors involve redundancy, and thus require the on-line connection of a stand-by unit and disconnection of the faulty unit. Solutions for soft errors include design techniques for error containment, error detection and correction via encoding.

Moreover, when soft errors induce timing errors, system based on double-latch clocking can be used for detection and correction. NoCs can provide resilient solutions toward hard errors (by supporting seamless connection/ disconnection of units) and soft errors (by layered error correction).

Figure 1.4. Failure on a transistor due to oxide breakdown.

Non-determinism in SoC Modeling and Design
As SoC complexity scales, it will be more difýcult, if not impossible, to capture their functionality with fully deterministic models of operation. In other words, system models may have multiple implementations. Property abstraction, which is key to managing complexity in modeling and design, will hide implementation details and designers will have to relinquish control of such details.

Whereas abstract modeling and automated synthesis enables complex system design, such an approach increases the variability of the physical and electrical parameters. In summary, to ensure correct and safe realizations, the system architecture and design style have to be resilient against errors generated by various sources, including:

  • process technology (parameter spreading, defect density, failure rates);
  • environment (temperature variation, EMI, radiation);
  • operation mode (very-low-voltage operation);
  • design style (abstraction and synthesis from non-deterministic models).

Variability, Design Methodologies and NoCs
Dealing with variability is an important matter affecting many aspects of SoC design. We consider here a few aspects related to on-chip communication design.

The ýrst important issue deals with malfunction containment. Traditionally, malfunctions have been avoided by putting stringent rules on physical design and by applying stringent tests on signal integrity before tape out. Rules are such that variations of process parameters can be tolerated, and integrity analysis can detect potential problems such as crosstalk. This approach is conservative in nature, and leads to perfecting the physical layout of circuits.

On the other hand, the downscaling of technologies has unveiled many potential problems and as a result the physical design tools have grown in complexity, cost and time to achieve design closure. At some point, correct-by-construction design at the physical level will no longer be possible. Similarly, the increasingly larger amount of connections on chip will make signal integrity analysis unlikely to detect all potential crosstalk errors.

Future trends will soften requirements at the physical and electrical level, and require higher-level mechanisms for error correction. Thus, electrical errors will be considered inevitable. Nevertheless, their effect can be contained by techniques that correct them at the logic and functional levels. In other words, the error detection/correction paradigm applied to networking will become a standard tool in on-chip communication design.

Timing errors are an important side effect of variability. Timing errors can be originated by a wide variety of causes, including but not limited to: incorrect wiring delay estimate, overaggressive clocking, crosstalk and soft (radiation-induced) errors. Timing errors can be detected by double latches, gated by different clocking signals, and by comparing the latched data. When the data differs, it means that most likely the signal settled after the ýrst latch was gated, that is, that a timing error was on the verge of being propagated. (Unfortunately, errors can happen also in the latch themselves.)

Figure 1.5. The voltage swing on communication busses is reduced, even though signal integrity is partially compromised [35]. Encoding techniques are used to detect corrupted data which is retransmitted. The retransmission rate is an input to a closed-loop dynamic voltage scaling (DVS) control scheme, which sets the voltage swing at a trade-off point between energy saving and latency penalty (due to data retransmission).

Asynchronous design methodologies can make the circuit resilient to delay variations. For example, speed-independent and delay-insensitive circuit families can operate correctly in presence of delay variations in gates and interconnects. Unfortunately, design complexity often make the application of an integral asynchronous design methodology impractical. A viable compromise is the use of globally asynchronous locally synchronous (GALS) circuits that use asynchronous handshaking protocols to link various synchronous domains possibly clocked at various frequencies.

Figure 1.6. Razor [11] is another realization of self-calibrating circuits, where a processor's supply is lowered till errors occur. The correct operation of the processor is preserved by an error detection and pipeline adjustment technique. As a result, the processor settles on-line to an operating voltage which minimizes the energy consumption even in the presence of variation of technological parameters.

NoCs are well poised to deal with variability because networking technology is layered and error detection, containment and correction can be done at various layers, according to the nature of the possible malfunction. There are several paradigms that deal with variability for NoCs. Self-calibrating circuits are circuits that adapt on-line to the operating conditions. There are several embodiments of self-calibrating circuits, as shown in Figure 1.5 and Figure 1.6 above and Figure 1.7 below.

Figure 1.7. T-error is a timing methodology for NoCs where data is pipelined through double latches, where the former used an aggressive period and the latter a safe one. For most patterns, T-error will forward data from the ýrst latch. When the slowest patterns are transmitted that fail the deadline at the ýrst latch, correct but slower operation is performed by the second latch [30].

Next in Part 2: System on chip objectives and network on chip needs

Used with the permission of the publisher, Newnes/Elsevier, this series of six articles is based on material from "Networks On Chips: Technology and Tools," by Luca Benini and Giovanni De Micheli.

Luca Benini is professor at the Department of Electrical Engineering and Computer Science at the University of Bologna, Italy. Giovanni De Micheli is professor and director of the Integrated Systems  Center at EPF in Lausanne, Switzerland.

References
[1] A. Adriahantenaina, H. Charlery, A. Greiner, L. Mortiezand and C. Zeferino, "SPIN: A Scalable, Packet Switched, On-Chip Micro-network,''DATE - Design, Automation and Test in Europe Conference and Exhibition, 2003, pp. 70 -73 .
[2]A.H. Ajami, K. Banerjee and M. Pedram, "Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects,'' IEEE Transactions on CAD, Vol. 24, No. 6, June 2005, pp. 849 - 861.
[3] H. Bakoglu, Circuits, Interconnections, and Packaging for VLSI, Addison-Wesley, Upper Saddle River, NJ, 1990.
[4] L. Benini, A. Bogliolo and G. De Micheli, "A Survey of Design Techniques for System-Level Dynamic Power Management,'' IEEE Transactions on Very Large-Scale Integration Systems, Vol. 8, No. 3, June 2000, pp. 299 - 316.
[5]W.O. Cesario, D. Lyonnard, G. Nicolescu, Y. Paviot, S. Yoo, L. Gauthier,
M. Diaz-Nava and A.A. Jerraya, "Multiprocessor SoC Platforms: A Component-Based Design Approach,'' IEEE Design and Test of Computers, Vol. 19, No. 6, November"December 2002, pp. 52 - 63.
[6]W. Dally and B. Towles, Principles and Practices of Interconnection Networks, Morgan Kaufmann, San Francisco, CA, 2004.
[7]W. Dally and B. Towles, "Route Packets, Not Wires: On-Chip Interconnection Networks,'' Proceedings of the 38th Design Automation Conference. 2001.
[8]W.J. Dally and H. Aoki, "Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels,'' IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No. 4, April 1993, pp. 466 - 475.
[9]W. Dally and C. Seitz, "The Torus Routing Chip,'' Distributed Processing, Vol. 1, 1996, pp. 187 - 196.
[10]M. Dall'Osso, G. Biccari, L. Giovannini, D. Bertozzi and L. Benini, "Xpipes: A Latency Insensitive Parameterized Network-on-Chip Architecture for Multiprocessor SoCs,'' International Conference on Computer Design, 2003, pp. 536"539.
[11]D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, N. S. Kim and K. Flautner, "Razor: Circuit-Level Correction of Timing Errors for Low-Power Operation,'' IEEE Micro, Vol. 24, No. 6, November-December 2004, pp. 10 - 20.
[12]W. Dally and J. Poulton, Digital Systems Engineering, Cambridge University Press, Cambridge, MA, 1998.
[13]J. Duato, S. Yalamanchili and L. Ni, Interconnection Networks: An Engineering Approach, Morgan Kaufmann, San Francisco, CA, 2003.
[14]T. Dumitra, S. Kerner and R. Marculescu, "Towards On-Chip Fault-Tolerant Communication,'' ASPDAC - Proceedings of the Asian-South Paciýc Design Automation Conference, 2003, pp. 225 - 232.
[15]S. Goel, K. Chiu, E. Marinissen, T. Nguyen and S. Oostdijk, "Test Infrastructure Design for the Nexperia Home Platform PNX8550 System Chip,'' DATE - Proceedings of the Design Automation and Test Europe Conference, 2004.
[16]K. Goossens, J. van Meerbergen, A. Peeters and P. Wielage, "Networks on Silicon: Combining Best Efforts and Guaranteed Services,'' Design Automation and Test in Europe Conference, 2002, pp. 423 - 427.
[17]R. Hegde and N. Shanbhag, "Toward Achieving Energy Efýciency in Presence of Deep Submicron Noise,'' IEEE Transactions on VLSI Systems, Vol. 8, No. 4, August 2000, pp. 379 - 391.
[18]R. Ho, K. Mai and M. Horowitz, "The Future of Wires,'' Proceedings of the IEEE, January 2001.
[19]J. Hu and R. Marculescu, "Energy-Aware Mapping for Tile-Based NOC Architectures Under Performance Constraints,'' Asian-Pacific Design Automation Conference, 2003.
[20]F. Karim, A. Nguyen and S. Dey, "On-Chip Communication Architecture for OC-768 Network Processors,'' Proceedings of the 38th Design Automation Conference, 2001.
[21]B. Khailany, et al., "Imagine: Media Processing with Streams,'' IEEE Micro, Vol. 21, No. 2, 2001, pp. 35"46.
[22]S. Kumar, et al., "A Network on Chip Architecture and Design Methodology,'' VLSI on Annual Symposium, IEEE Computer Society ISVLSI 2002.
[23]D. Lackey, P. Zuchowski, T. Bednar, D. Stout, S. Gould and J. Cohn, "Managing Power and Performance for Systems on Chip Design Using Voltage Islands,'' ICCAD -  International Conference on Computer Aided Design, 2002, pp. 195 - 202.
[24]P. Lieverse, P. van der Wolf, K. Vissers and E. Deprettere, "A Methodology for Architecture Exploration of Heterogeneous Signal Processing Systems,'' Journal of VLSI Signal Processing for Signal, Image and Video Technology, Vol. 29, No. 3, 2001, pp. 197 - 207.
[25]M. Oka and M. Suzuoki, "Designing and Programming the Emotion Engine,'' IEEE Micro, Vol. 19, No. 6, November - December 1999, pp. 20 - 28.
[26]D. Pham, et al., "Overview of the Architecture, Circuit Design, and Physical Implementation of a First-Generation Cell Processor,'' IEEE Journal of Solid-State Circuits, Vol. 41, No. 1, January 2006, pp. 179 - 196.
[27]A. Pinto, L. Carloni and A. Sangiovanni-Vincentelli, "Constraint-Driven Communication Synthesis,'' Design Automation Conference, 2002, pp. 195 - 202.
[28]K. Skadron, et al., "Temperature-Aware Computer Systems: Opportunities and Challenges,'' IEEE Micro, Vol. 23, No. 6, November"December 2003, pp. 52 - 61.
[29]D. Sylvester and K. Keutzer, "A Global Wiring Paradigm for Deep Submicron Design,'' IEEE Transactions on CAD/ICAS, Vol. 19, No. 2, February 2000, pp. 242 - 252.
[30]R. Tamhankar, S. Murali and G. De Micheli, "Performance Driven Reliable Link for Networks on Chip,'' ASPDAC - Proceedings of the Asian Paciýc Conference on Design Automation, Shahghai, 2005, pp. 749 - 754.
[31]T. Theis, "The Future of Interconnection Technology,'' IBM Journal of Research and Development, Vol. 44, No. 3, May 2000, pp. 379"390.
[32]E. Waingold, et al., "Baring It All to Software: Raw Machines,'' IEEE Computer, Vol. 30, No. 9, September 1997, pp. 86 - 93.
[33]J. Walrand and P. Varaiya, High-Performance Communication Networks, Morgan Kaufmann, San Francisco, CA, 2000.
[34]M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Upper Saddle River, NJ, 1995.
[35]F. Worm, P. Ienne, P. Thiran and G. De Micheli, "An Adaptive Low-Power Transmission Scheme for On-Chip Networks,'' ISSS, Proceedings of the International Symposium on System Synthesis, Kyoto, October 2002, pp. 92 - 100.
[36] H. Zhang, V. George and J. Rabaey, "Low-Swing On-Chip Signaling Techniques: Effectiveness and Robustness,'' IEEE Transactions on VLSI Systems, Vol. 8, No. 3, June 2000, pp. 264 - 272.
[37]http://public.itrs.net/

1

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :