The challenges of nextgen multicore networks-on-chip systems: Part 1 - Embedded.com

The challenges of nextgen multicore networks-on-chip systems: Part 1

The reasonfor the growing interest in networks on chips (NoCs) canbe explained by looking at the evolution of integrated circuittechnology and at the ever-increasingrequirements of electronic systems. The integrated microprocessor hasbeen a landmark in the evolution of computing technology.

Whereas it took monstrous efforts to becompleted, it appears now as a simple object to us. Indeed, themicroprocessor involved the connection of a computational engine to alayered memory system, and this was achieved using busses. In the lastdecade, the frontiers of integrated circuit design opened widely. Onone side, complex application specific integrated circuits (ASICs)weredesigned toaddress-speci�c applications, for examplemobile telephony.

These systems require functional units, thus requiring efficienton-chipcommunication. On another side, multiprocessing platforms weredeveloped to address high-performance computation, such as imagerendering. Examples are Sony's emotion engine [25] and IBM's cell chip[26], where on-chip communication efficiency is key to the overallsystem performance.

At the same time, the shrinking ofprocessing technology in the deep submicron (DSM) domain exacerbatedthe imbalance between gate delays and wire delays on chip. Accuratephysical design became the bottleneck for design closure, a word injargon to indicate the ability to conclude successfully a tape out.Thus, the on-chip interconnection is now the dominant factor indetermining performance. Architecting the interconnect level at ahigher abstraction level is a key factor for system design.

We have to understand the introductionof NoCs in systems-on-chip (SoCs) design asa gradual process, namelyas an evolution of bus interconnect technology. For example, there isnot a strict distinction between multi-layer busses and crossbar NoCs.We have also to credit C. Seitz and W. Dally [9] for stressing the needofnetwork interconnect for high-performance multiprocessing, and forrealizing the �rst prototypes of networked integrated multiprocessors.

But overall, NoC has become a broadtopic of research and development in the new millennium, when designerswere confronted with technological limitations, rising hardware designcosts and increasingly higher system complexity.

Figure1.1. Traf�c pattern in a large-scale system. Limited parallelism isoften a cause of congestion.

Why on-chip networking?
Systems on silicon have a complexity comparable to skyscrapers oraircraft carriers, when measured in terms of number of basic elements.Differently from other complex systems, they can be cloned in astraightforward way but they have to be designed in correctly, asrepairs are nearly impossible. SoCs require design methodologies thathave commonalities with other types of large-scale system design (Figure 1.1 above ). In particular,when looking at on-chip interconnect design methods, it is useful tocompare the on-chip interconnect to the worldwide interconnect providedby the Internet.

The latter is capable of taming thesystem complexity and of providing reliable service in presence oflocal malfunctions. Thus, networking technology has been able toprovide us with quality of service (QoS),despite the heterogeneity andvariability of the Internet nodes and links. It is then obvious thatnetworking technology can be instrumental for the bettering ofvery-large-scale integration (VLSI) circuit/system design technology.

On the other hand, the challenges inmarrying network and VLSI technologies are in leveraging the essentialfeatures of networking that are crucial to obtaining fast and reliableon-chip communication. Some novices think that on-chip networkingequates to porting the Transmission Control Protocol/Internet Protocol(TCP/IP) to silicon or achievingan on-chip Internet.

This is not feasible, due to the highlatency related to the complexity of TCP/IP. On-chip communication mustbe fast, and thus networking techniques must be simple and effective.Bandwidth, latency and energy consumption for communication must betraded off in the search for the best solution.

On the bright side, VLSI chips have wideavailability of wires on many layers, which can be used to carry dataand control information. Wide data busses realize the paralleltransport of information. Moreover, data and control do not need to betransported by the same means, as in networked computers (Figure 1.2, below ). Local proximityof computational and storage unit on chip makes transport extremelyfast. Overall, the wire-oriented nature of VLSI chips makes on-chipnetworking both an opportunity and a challenge.

Figure1.2. Distributed systems communicate via a limited number of cables(a). Conversely, VLSI chips use up to 10 levels of wires forcommunicating (b).

In summary, the mainmotivation forusing on-chip networking is to achieve performance using a systemperspective of communication. This reason is corroborated by the factthat simple on-chip communication solutions do not scale up when thenumber of processing and storage arrays on chip increases. For example,on-chip busses can serve a limited number of units, and beyond that,performance degrades due to the bus parasitic capacitance and thecomplexity of arbitration.

Technology trends
In the current projections [37] of future silicon technologies, theoperating frequency and transistor density will continue to grow,making energy dissipation and heat extraction a major concern.

Atthesame time, on-chip supply voltages will continue to decrease, withadverse impact on signal integrity. The voltage reduction, even thoughbene�cial, will not suf�ce to mitigate the energy consumption problem,where a major contribution is due to leakage. Thus, SoCs willincorporate dynamic power management (DPM) techniques in various formsto satisfy energy consumption bounds [4].

Global wires, connecting differentfunctional units, are likely to have propagation delays largelyexceeding the clock period [18]. Whereas signal pipelining oninterconnections will become common practice, correct design willrequire knowing the signal delay with reasonable accuracy. Indeed, anegative side effect of technology downsizing will be the spreading ofphysical parameters (e.g., variance of wire delay per unit length) andits relative importance as compared to the timing reference signals(e.g., clock period).

The spreading of physical parameterswill make it harder to achieve high-performing chips that safely meetall timing constraints. Worst-case timing methodologies, that requireclocking period larger than the worst-case propagation delay, mayunderuse the potentials of the technology, especially when theworst-case propagation delays are rare events.

Moreover, it is likelythat varying on-chip temperature profiles (due to varying loads andDPM) will increase the spread of wiring delays [2]. Thus, it will bemandatory to go beyond worst-case design methodology, and usefault-tolerant schemes that can recover from timing errors [11, 30,35].

Most large SoCs are designed usingdifferent voltage islands [23], which are regions withspeci�c voltage and operation frequencies, which in turn may depend onthe workload and dynamic voltage and frequency scaling. Synchronizationamong these islands may become extremely hard to achieve, due to timingskews and spreads. Global wires will span multiple clockdomains, andsynchronization failures in communicating between different clockdomains will be rare but unavoidable events [12].

Signal Integrity
With forthcoming technologies, it will be harder to guaranteeerror-free information transfer (at the electrical level) on wiresbecause of several reasons:

  • Reduced signal swings with a corresponding reductionof voltage noise margins.
  • Crosstalk is bound to increase, and the complexity ofavoiding crosstalk by identifying all potential on-chip noise sourceswill make it unlikely to succeed fully.
  • Electromagnetic interference (EMI) by external sourceswill become more of a threat because of the smaller voltage swings andsmaller dynamic storage capacitances.
  • The probability of occasional synchronization failures and/or metastability will rise. These erroneous conditions are possibleduring system operation because of transmission speed changes, localclock frequency changes, timing noise (jitter), etc.
  • Soft errors due to collision of thermal neutrons(produced by the decay of cosmic ray showers) and/or alpha particles(emitted by impurities in the package). Soft errors can create spuriouspulses, which can affect signals on chip and/or discharge dynamicstorage capacitances.

Moreover, SoCs may be willfully operatedin error-prone operating conditions because of the need of extendingbattery lifetime by lowering energy consumption via supply voltageover-reduction. Thus, speci�c run-time policies may trade-off signalintegrity for energy consumption reduction, thus exacerbating theproblems due to the fabrication technology.

Reliability
System-level reliability is the probability that the system willoperate correctly at time, t ,as a function of time. The expected value of the reliability functionis the mean time to failure (MTTF ) .Increasing MTTF well beyond theexpected useful life of a product is an important design criterion.Highly reliable systems have been object of study for many years.Beyond traditional applications, such as aircraft control, defenseapplications and reliable computing, there are many new �elds requiringhigh-reliable SoCs, ranging from medical applications to automotivecontrol and more generally to embedded systems that are critical forhuman operation and life.

Figure1.3. Failure on a wire due to electromigration.

The increased demand of high-reliableSoCs is counterbalanced by the increased failure rates of devices andinterconnects. Due to technology downscaling, failures in theinterconnect due to electromigration are more likely to happen (Figure 1.3, above ). Similarly,device failure due to dielectric breakdown is more likely because ofhigher electric �elds and carrier speed (Figure 1.4, below ). Temperaturecycles on chip induce mechanical stress, that has counter-productiveeffects [28].

For these reasons, SoCs need to bedesigned with speci�c resilience toward hard (i.e., permanent) andsoft (i.e., transient) malfunctions. System-level solutions for harderrors involve redundancy, and thus require the on-line connection of astand-by unit and disconnection of the faulty unit. Solutions for softerrors include design techniques for error containment, error detectionand correction via encoding.

Moreover, when soft errors induce timingerrors, system based on double-latch clocking can be used for detectionand correction. NoCs can provide resilient solutions toward hard errors(by supporting seamless connection/ disconnection of units) and softerrors (by layered error correction).

Figure1.4. Failure on a transistor due to oxide breakdown.

Non-determinism in SoC Modelingand Design
As SoC complexity scales, it will be more dif�cult, if not impossible,to capture their functionality with fully deterministic models ofoperation. In other words, system models may have multipleimplementations. Property abstraction, which is key to managingcomplexity in modeling and design, will hide implementation details anddesigners will have to relinquish control of such details.

Whereas abstract modeling and automatedsynthesis enables complex system design, such an approach increases thevariability of the physical and electrical parameters. In summary, toensure correct and safe realizations, the system architecture anddesign style have to be resilient against errors generated by varioussources, including:

  • process technology (parameter spreading, defectdensity, failure rates);
  • environment (temperature variation, EMI,radiation);
  • operation mode (very-low-voltage operation);
  • design style (abstraction and synthesis fromnon-deterministic models).

Variability, DesignMethodologies and NoCs
Dealing with variability is an important matter affecting many aspectsof SoC design. We consider here a few aspects related to on-chipcommunication design.

The �rst important issue deals withmalfunction containment. Traditionally, malfunctions have been avoidedby putting stringent rules on physical design and by applying stringenttests on signal integrity before tape out. Rules are such thatvariations of process parameters can be tolerated, and integrityanalysis can detect potential problems such as crosstalk. This approachis conservative in nature, and leads to perfecting the physical layoutof circuits.

On the other hand, the downscaling oftechnologies has unveiled many potential problems and as a result thephysical design tools have grown in complexity, cost and time toachieve design closure. At some point, correct-by-construction designat the physical level will no longer be possible. Similarly, theincreasingly larger amount of connections on chip will make signalintegrity analysis unlikely to detect all potential crosstalk errors.

Future trends will soften requirementsat the physical and electrical level, and require higher-levelmechanisms for error correction. Thus, electrical errors will beconsidered inevitable. Nevertheless, their effect can be contained bytechniques that correct them at the logic and functional levels. Inother words, the error detection/correction paradigm applied tonetworking will become a standard tool in on-chip communication design.

Timing errors are an important sideeffect of variability. Timing errors can be originated by a widevariety of causes, including but not limited to: incorrect wiring delayestimate, overaggressive clocking, crosstalk and soft(radiation-induced) errors. Timing errors can be detected by doublelatches, gated by different clocking signals, and by comparing thelatched data. When the data differs, it means that most likely thesignal settled after the �rst latch was gated, that is, that a timingerror was on the verge of being propagated. (Unfortunately, errors canhappen also in the latch themselves.)

Figure1.5. The voltage swing on communication busses is reduced, even thoughsignal integrity is partially compromised [35]. Encoding techniques areused to detect corrupted data which is retransmitted. Theretransmission rate is an input to a closed-loop dynamic voltagescaling (DVS) control scheme, which sets the voltage swing at atrade-off point between energy saving and latency penalty (due to dataretransmission).

Asynchronous design methodologies canmake the circuit resilient to delay variations. For example,speed-independent and delay-insensitive circuit families can operatecorrectly in presence of delay variations in gates and interconnects.Unfortunately, design complexity often make the application of anintegral asynchronous design methodology impractical. A viablecompromise is the use of globally asynchronous locally synchronous(GALS) circuits that use asynchronous handshaking protocolstolink various synchronous domains possibly clocked at variousfrequencies.

Figure1.6. Razor [11] is another realization of self-calibrating circuits,where a processor's supply is lowered till errors occur. The correctoperation of the processor is preserved by an error detection andpipeline adjustment technique. As a result, the processor settleson-line to an operating voltage which minimizes the energy consumptioneven in the presence of variation of technological parameters.

NoCs are well poised to deal withvariability because networking technology is layered and errordetection, containment and correction can be done at various layers,according to the nature of the possible malfunction. There are severalparadigms that deal with variability for NoCs. Self-calibratingcircuits are circuits that adapt on-line to the operating conditions.There are several embodiments of self-calibrating circuits, as shown inFigure 1.5 and Figure 1.6 above and Figure 1.7 below .

Figure1.7. T-error is a timing methodology for NoCs where data is pipelinedthrough double latches, where the former used an aggressive period andthe latter a safe one. For most patterns, T-error will forward datafrom the �rst latch. When the slowest patterns are transmitted thatfail the deadline at the �rst latch, correct but slower operation isperformed by the second latch [30].

Next in Part 2: System onchip objectives and network on chip needs

Usedwith thepermission of the publisher, Newnes/Elsevier, this series of sixarticles is based on material from “NetworksOn Chips: Technology and Tools,” by Luca Benini and Giovanni DeMicheli.

LucaBenini isprofessor at the Department of Electrical Engineering and ComputerScience at the University of Bologna, Italy. Giovanni De Micheli isprofessor and director of the Integrated Systems  Center at EPF inLausanne, Switzerland.

References
[1] A. Adriahantenaina, H. Charlery, A. Greiner, L.Mortiezand and C. Zeferino, “SPIN:A Scalable, Packet Switched, On-ChipMicro-network,''DAT E – Design , Automatio nan dTes t i n Europ e Conferenc e an dExhibition , 2003, pp. 70 -73 .
[2]A.H. Ajami, K. Banerjee and M. Pedram, Modeling andAnalysis of Nonuniform Substrate Temperature Effects on Global ULSIInterconnects ,'' IEE E Transaction s o nCAD ,Vol. 24, No. 6, June 2005, pp. 849 – 861.
[3] H. Bakoglu, Circuits, Interconnections, and Packagingfor VLSI, Addison-Wesley, Upper Saddle River, NJ, 1990.
[4] L. Benini, A. Bogliolo and G. De Micheli, “ASurvey ofDesign Techniques for System-Level Dynamic Power Management,'' IEE ETransaction s o n Ver y Large-Scal eIntegratio n Systems , Vol. 8, No. 3, June 2000, pp.299 – 316.
[5]W.O. Cesario, D. Lyonnard, G. Nicolescu, Y. Paviot, S.Yoo, L. Gauthier,
M. Diaz-Nava and A.A. Jerraya, “MultiprocessorSoC Platforms: AComponent-Based Design Approach,'' IEE E Desig n an dTes t o f Computers , Vol. 19, No. 6,November”December 2002, pp. 52 – 63.
[6]W. Dally and B. Towles,Principles and Practices ofInterconnection Networks, Morgan Kaufmann, San Francisco, CA, 2004.
[7]W. Dally and B. Towles, “RoutePackets, Not Wires:On-Chip Interconnection Networks,'' Proceeding s o fth e38t h Desig n Automatio n Conference .2001.
[8]W.J. Dally and H. Aoki, “Deadlock-FreeAdaptiveRouting in Multicomputer Networks Using Virtual Channels,'' IEE ETransaction s o n Paralle l an d Distribute dSystems , Vol. 4, No. 4, April 1993, pp. 466 – 475.
[9]W. Dally and C. Seitz, “The TorusRouting Chip,'' Distribute dProcessing , Vol. 1, 1996, pp. 187 – 196.
[10]M. Dall'Osso, G. Biccari, L. Giovannini, D.Bertozzi and L. Benini, “Xpipes: A LatencyInsensitive ParameterizedNetwork-on-Chip Architecture for Multiprocessor SoCs,''InternationalConference on Computer Design, 2003, pp. 536″539.
[11]D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin,T. Mudge, N. S. Kim and K. Flautner, “Razor:Circuit-Level Correctionof Timing Errors for Low-Power Operation,'' IEE E Micro ,Vol. 24, No. 6, November-December 2004, pp. 10 – 20.
[12]W. Dally and J. Poulton, DigitalSystemsEngineering, Cambridge University Press, Cambridge, MA, 1998.
[13]J. Duato, S. Yalamanchili and L. Ni,InterconnectionNetworks: An Engineering Approach, Morgan Kaufmann, SanFrancisco, CA, 2003.
[14]T. Dumitra, S. Kerner and R. Marculescu, “TowardsOn-Chip Fault-Tolerant Communication,'' ASPDA C – Proceeding so f th e Asian-Sout h Paci� c Desig nAutomatio n Conference , 2003, pp. 225 – 232.
[15]S. Goel, K. Chiu, E. Marinissen, T. Nguyen and S.Oostdijk, “TestInfrastructure Design for the Nexperia Home PlatformPNX8550 System Chip,'' DAT E – Proceeding s o fth e Desig n Automatio n an d Tes tEurop e Conference , 2004.
[16]K. Goossens, J. van Meerbergen, A. Peeters and P.Wielage, “Networkson Silicon: Combining Best Efforts and GuaranteedServices,'' Desig n Automatio n an d Tes ti n Europ e Conference , 2002, pp. 423 – 427.
[17]R. Hegde and N. Shanbhag, “TowardAchievingEnergy Ef�ciency in Presence of Deep Submicron Noise,'' IEE ETransaction so n VLS I Systems , Vol. 8, No. 4, August2000, pp. 379 – 391.
[18]R. Ho, K. Mai and M. Horowitz, “TheFuture ofWires,'' Proceedings of the IEEE, January 2001.
[19]J. Hu and R. Marculescu, “Energy-AwareMappingfor Tile-Based NOC Architectures Under Performance Constraints,'' Asian-Pacific Desig n Automatio n Conference , 2003.
[20]F. Karim, A. Nguyen and S. Dey, “On-ChipCommunication Architecture for OC-768 Network Processors,''Proceedingsof the 38th Design Automation Conference, 2001.
[21]B. Khailany, et al., “Imagine:Media Processingwith Streams,'' IEEE Micro, Vol. 21, No. 2, 2001, pp. 35″46.
[22]S. Kumar, et al., “ANetwork on Chip Architectureand Design Methodology,'' VLSI on Annual Symposium, IEEE ComputerSociety ISVLSI 2002.
[23]D. Lackey, P. Zuchowski, T. Bednar, D. Stout, S.Gould and J. Cohn, “ManagingPower and Performance for Systems on ChipDesign Using Voltage Islands,'' ICCAD –  InternationalConference onComputer Aided Design, 2002, pp. 195 – 202.
[24]P. Lieverse, P. van der Wolf, K. Vissers and E.Deprettere, “AMethodology for Architecture Exploration ofHeterogeneous Signal Processing Systems,'' Journa l o fVLS I Signa l Processin g fo r Signal ,Imag e an d Vide o Technology , Vol.29, No. 3, 2001, pp. 197 – 207.
[25]M. Oka and M. Suzuoki, “Designingand Programmingthe Emotion Engine,'' IEE E Micro , Vol. 19, No.6,November – December 1999, pp. 20 – 28.
[26]D. Pham, et al., “Overviewof the Architecture,Circuit Design, and Physical Implementation of a First-Generation CellProcessor,'' IEE E Journa l o f Solid-Stat eCircuits , Vol. 41, No. 1, January 2006, pp. 179 – 196.
[27]A. Pinto, L. Carloni and A.Sangiovanni-Vincentelli, “Constraint-DrivenCommunication Synthesis,''Design Automation Conference, 2002, pp. 195 – 202.
[28]K. Skadron, et al., “Temperature-AwareComputerSystems: Opportunities and Challenges,'' IEE

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.