Reliable and power-aware architectures: Measurement and modeling - Embedded.com

Reliable and power-aware architectures: Measurement and modeling

Editor's Note: Embedded designers must contend with a host of challenges in creating systems for harsh environments. Harsh environments present unique characteristics not only in terms of temperature extremes but also in areas including availability, security, very limited power budget, and more. In Rugged Embedded Systems, the authors present a series of papers by experts in each of the areas that can present unusually demanding requirements. In Chapter 2 of the book, the authors address fundamental concerns in reliability and system resiliency. This series excerpts that chapter in a series of installments including:
Reliable and power-aware architectures: Sustaining system resiliency
Reliable and power-aware architectures: Measuring resiliency
Reliable and power-aware architectures: Soft-error vulnerabilities 
Reliable and power-aware architectures: Microbenchmark generation
– Reliable and power-aware architectures: Measurement and modeling (this article)

Elsevier is offering this and other engineering books at a 30% discount. To use this discount, click here and use code ENGIN317 during checkout.

Adapted from Rugged Embedded Systems, Computing in Harsh Environments, by Augusto Vega. Pradip Bose, Alper Buyuktosunoglu.

CHAPTER 2. Reliable and power-aware architectures: Fundamentals and modeling (Continued)

8 POWER AND PERFORMANCE MEASUREMENT AND MODELING

8.1 IN-BAND VERSUS OUT-OF-BAND DATA COLLECTION

From a power and performance standpoint, a system’s status can be described by a variety of sensors and/or performance counters that usually provide power consumption, temperature, and performance information of the different system components (CPUs, memory and storage modules, network, etc.). There are two approaches to access such information on line: it can be done either in-band or out-of-band. We say that a system is sensed in-band when the collected information is generated by the same component that the information refers to; like, for example, CPU power consumption or thermal information generated internally by the CPU itself in some specific way. This is the case of the on-chip controller (OCC) in POWER-based systems [27]—a co-processor embedded directly on the main processor die—as well as Intel’s Power Control Unit (PCU) and associated on-board power meter capabilities [28]. On the other hand, if the collected information is obtained in a way that does not involve the component under measurement, then we say that system information is gathered out-of-band. This may be the case of CPU power consumption estimated by an external process (running on the same system or even on a different one) using some “proxy” CPU information (like performance counters). IBM’s Automated Measurement of Systems for Energy and Temperature Reporting (AMESTER) is an example of out-of-band system measurement [29]. AMESTER constitutes an interface to remotely monitor POWER-based systems through the service processor available in those systems. The service processor (which is different from the main system processor) accesses a variety of power, temperature, and other sensors that monitor the entire system status and deliver them to the remote AMESTER client.

In practice, resorting to in-band or out-of-band system measurement will depend on (1) the available data collection capabilities of the system in question, (2) the performance perturbation, and (3) the expected information granularity and correlation with the actual execution. For example, on-chip power metering capabilities are relatively novel and may not be widely available, in which case chip power consumption could be estimated through the use of proxy information—for example, performance counters to gauge processor activity and derive corresponding power figures. Even though in-band measurement may look more appealing over out-of-band approaches, it may also perturb system performance or interfere with the running workloads since the component being measured has to “produce” the requested information in some way. Finally, the correlation (or lack thereof) of the collected data and the actual measured events should be taken into account. Out-of-band measurement techniques usually perform poorer in terms of correlating the measurements with actual events with precision, in particular when the measurement is conducted remotely.

Regardless if a system is measured in-band or out-of-band, processor performance counters constitute a key source of information when it comes to characterize the run-time behavior of a CPU from performance, power, and thermal perspectives. Hence they are vastly used to model power consumption and temperature when power and temperature measurement from sensors is not possible. The critical part in this process is, however, the selection of the right set of counters that correlates with real values with enough accuracy—a problem extensively studied by several researchers and groups, like for example [30–33].

8.2 PROCESSOR PERFORMANCE COUNTERS

Processor performance counters are hardware counters built into a processor to “count” certain events that take place at the CPU level, like the number of cycles and instructions that a program executed, its associated cache misses, accesses to off-chip memory, among several other things. The events that can be tracked with performance counters are usually simple ones, but combined in the right way can provide extremely useful insights of a programs behavior and, therefore, constitute a valuable tool for debugging purposes. As it is explained by Elkin and Indukuru in [34], the first step in optimizing an application is characterizing how well the application runs on the target system. The fundamental intensive metric used to characterize the performance of any given workload is cycles per instruction (CPI)—the average number of clock cycles (or fractions of a cycle) needed to complete an instruction. The CPI is a measure of processor performance: the lower it is, the more effectively the processor hardware is being kept busy. Elkin and Indukuru also provide a detailed description of the performance monitoring unit (PMU) built into IBM’s POWER7 systems, which comprises of six thread-level performance monitor counters (PMCi). PMC1 to PMC4 are programmable, PMC5 counts nonidle completed instructions and PMC6 counts non idle cycles. Other processor manufacturers offer similar counting capabilities, like Intel’s Performance Counter Monitor [35] and ARM’s Performance Monitoring Unit [36].

Processor performance counters can also be effectively used to drive run-time power and thermal management decisions in the CPU. As an example, the Power-Aware Management of Processor Actuators algorithm (PAMPA) proposed by Vega et al. [37] aims to provide robust chip-level power management by coordinating the operation of dynamic voltage and frequency scaling (DVFS), core folding and per-core power gating (PCPG) using CPU utilization information derived from a few performance counters. More specifically, PAMPA collects on-line hardware events information from the PM_RUN_CYC performance counter available in POWER7+ systems [38]. PM_RUN_CYC counts the nonidle processor cycles at physical thread level. In other words, it filters out those processor cycles during which a particular physical thread is idle and, therefore, constitutes a suitable proxy to estimate physical thread-level utilization.

8.3 POWER MODELING

The measurement of real, calibrated power consumption in hardware is difficult— the availability of on-chip power metering is a fairly new feature in today’s systems. Power modeling constitutes an alternative to real power measurement at the expense of accuracy and performance. If the lack of accuracy and the rate at which power is estimated can both be kept at acceptable levels, then power modeling becomes attractive when real power measurement is not feasible. As an example, IBM’s POWER7 does not have the necessary internal power measurement circuits and, therefore, the amount of power that each core consumes is estimated [39]. More specifically, POWER7 implements a hardware mechanism in the form of a “power proxy” by using a specially architected, programmably weighted counter-based architecture that monitors activities and forms an aggregate value. Activities are carefully selected with an understanding of the processor’s microarchitecture, such that they correlate maximally with active power consumption.

Power modeling approaches like the one just described for POWER7 are intended to provide power consumption estimations when the system is in production. Prior to that, during the concept phase, early-stage accurate power estimation is key to assess and predict the feasibility of a system (or a system component) from power, thermal, and efficiency standpoints. In particular, architectural power models are widely used to estimate the power consumption of a microprocessor, where high-level architectural and microarchitectural parameters (e.g., cache sizes, page size, pipeline depth/width) and activity factors (e.g., cache accesses, total instructions) are specified to the power modeling tool, which abstracts away the underlying implementation details. These high-level abstractions, which represent a trade-off between detail and flexibility/ease of use, enable an architect to quickly evaluate design decisions and explore various design spaces. Wattch [40] and McPAT [41] are two well-known examples of these models. Both of these tools are analytical, meaning that they use analytical equations of capacitance to model dynamic power. In contrast, empirical models, like PowerTimer [42] and ALPS [43], use precharacterized power data and equations from existing designs. For structures like control logic that are difficult to model analytically, empirical models and/or fudge factors are often used instead. The differences between analytical and empirical models have been described in past work [44, 45].

9 SUMMARY

This chapter presents an introduction to the fundamentals of reliable and power-aware systems from a general standpoint. In the first part of the chapter, the concepts of error, fault, and failure, the resolution phases of resilient systems, and the definition and associated metrics of hard and soft errors are discussed. The second part of the chapter presents two effective approaches to stress a system from resilience and power-awareness standpoints—namely fault injection and microbenchmarking. Finally, the last part of the chapter briefly introduces power and performance measurement and modeling basic ideas.

The goal of this chapter is to provide the reader with a basic understanding and overview to board the next chapters where these notions are considered within the specific domain of rugged systems. In this case, reliability issues become even more critical due to the inherent unfriendliness of harsh environments (like extreme temperature and radiation levels, very low power and energy budgets, strict fault tolerance and security constraints, among others). For example, NASA’s Curiosity rover (exploring the “Red Planet” as of the time of writing) uses two RAD750 radiation-hardened microprocessors in its on-board computer [46]. The engineers who designed this microprocessor had to consider the high radiation levels and wide temperature fluctuations in Mars to ensure mission success. Similarly, on-board computers design for unmanned aerial vehicles (i.e., drones) pose hurdles due to the tight power budget to perform mission-critical tasks, which must be successfully accomplished under a wide range of environmental conditions. Interplanetary rovers and drones are just two examples of harsh environment-capable embedded systems—many of the day-to-day devices that we use and rely on are subject to similar constraints, in some cases with critical consequences when they are not met. For example, cars, trucks, and even motorcycles are becoming smarter, and in some cases autonomous (driverless). Highly reliable, low-power embedded chips for harsh environments are the key enablers for autonomous and semiautonomous vehicles, and it is not difficult to imagine the safety consequences if the on-board control systems fail or are improperly operated. These topics are treated and discussed next in the following chapters.

REFERENCES

[1]   Avizienis, J.C. Laprie, B. Randell, C. Landwehr, Basic concepts and taxonomy of dependable and secure computing, IEEE Trans. Dependable Secure Comput. 1 (1) (2004) 11–33.

[2]  M.Emmert,C.E.Stroud,M.Abramovici, Online fault tolerance for FPGA logic blocks, IEEE Trans. Very Large Scale Integr. VLSI Syst. 15 (2) (2007) 216–226.

[3]  HymanJr.,K.Bhattacharya,N.Ranganathan,Redundancy mining for soft error detection in multicore processors, IEEE Trans. Comput. 60 (8) (2011) 1114–1125.

[4]  Paulsson,M.Hubner,J.Becker, Strategies to on-line failure recovery in self-adaptive systems based on dynamic and partial reconfiguration, in: Proceedings of the First NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2006), 2006, pp. 288–291.

[5]  Rao,C.Yang,R.Karri,A.Orailoglu, Toward future systems with nanoscale devices: overcoming the reliability challenge, Computer 44 (2) (2011) 46–53.

[6]  Dependable embedded systems, http://spp1500.itec.kit.edu.

[7]   Berg, C. Poivey, D. Petrick, D. Espinosa, A. Lesea, K.A. LaBel, M. Friendlich, H. Kim, A. Phan, Effectiveness of internal versus external SEU scrubbing mitigation strategies in a Xilinx FPGA: design, test, and analysis, IEEE Trans. Nucl. Sci. 55 (4) (2008) 2259–2266.

[8]   Hargreaves, H. Hult, S. Reda, Within-die process variations: how accurately can they be statistically modeled? in: Proceedings of the Asia and South Pacific Design Automation Conference (ASPDAC 2008), 2008, pp. 524–530.

[9]   Malek, A comparison connection assignment for diagnosis of multiprocessor systems, in: Proceedings of the Seventh Annual Symposium on Computer Architecture (ISCA 1980), 1980, pp. 31–36.

[10]  P.Preparata,G.Metze,R.T.Chien, On the connection assignment problem of diagnosable systems, IEEE Trans. Electron. Comput. EC-16 (6) (1967) 848–854.

[11]  F. DeMara, K. Zhang, C.A. Sharma, Autonomic fault-handling and refurbishment using throughput-driven assessment, Appl. Soft Comput. 11 (2) (2011) 1588–1599.

[12]   Mitra, N.R. Saxena, E.J. McCluskey, Common-mode failures in redundant VLSI systems: a survey, IEEE Trans. Reliab. 49 (3) (2000) 285–295.

[13]  C. Carmichael, Triple module redundancy design techniques for Virtex FPGAs, Xilinx Application Note: Virtex Series XAPP197 (v1.0.1), 2006.http://www.xilinx.com/sup port/documentation/application_notes/xapp197.pdf.

[14]   Lima Kastensmidt, L. Sterpone, L. Carro, M. Sonza Reorda, On the optimal design of triple modular redundancy logic for SRAM-based FPGAs, in: Proceedings of the Conference on Design, Automation and Test in Europe—Volume 2 (DATE 2005), 2005, pp. 1290–1295.

[15]  W. Greenwood, On the practicality of using intrinsic reconfiguration for fault recovery, IEEE Trans. Evol. Comput. 9 (4) (2005) 398–405.

[16]  C. Laprie, Dependable computing and fault tolerance: concepts and terminology, in: Proceedings of the 25th International Symposium on Fault-Tolerant Computing— Highlights From Twenty-Five Years, 1995, pp. 2–11.

[17]   Lombardo, J. Stathis, B. Linder, K.L. Pey, F. Palumbo, C.H. Tung, Dielectric breakdown mechanisms in gate oxides, J. Appl. Phys. (2005) 98, pp. 121301-1 – 121301-36.

[18]  E. Rauch, The statistics of NBTI-induced VT and β mismatch shifts in pMOSFETs, IEEE Trans. Device Mater. Reliab. 2 (4) (2002) 89–93.

[19]  J.Srinivasan, Lifetime reliability aware microprocessors, Ph.D.thesis,University of Illinois at Urbana-Champaign, 2006.

[20]  J.Srinivasan,S.Adve,P.Bose,J.Rivers,C.K.Hu, RAMP: a model for reliability aware microprocessor design, IBM Research Report, 2003. http://domino.watson.ibm.com/ library/CyberDig.nsf/papers/DFFC51D1FD991F0D85256E13005B95A3/$File/ rc23048.pdf RC 23048.

[21]   Srinivasan, S.V. Adve, P. Bose, J.A. Rivers, Exploiting structural duplication for lifetime reliability enhancement, in: Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), 2005, pp. 520–531.

[22]  J. Shin, Lifetime reliability studies for microprocessor chip architecture, Ph.D. thesis, University of Southern California, 2008.

[23]   Shin, V. Zyuban, P. Bose, T.M. Pinkston, A proactive wearout recovery approach for exploiting microarchitectural redundancy to extend cache SRAM lifetime, in: Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA 2008), 2008, pp. 353–362.

[24]   Shin, V. Zyuban, Z. Hu, J.A. Rivers, P. Bose, A framework for architecture-level lifetime reliability modeling, in: Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2007), 2007, pp. 534–543.

[25]  S. Mukherjee, C. Weaver, J. Emer, S.K. Reinhardt, T. Austin, A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor, in: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 36), 2003, pp. 29–40.

[26]   Li, S.V. Adve, P. Bose, J.A. Rivers, Architecture-level soft error analysis: examining the limits of common assumptions, in: Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN ’07), 2007, pp. 266–275.

[27]  OpenPOWER, On chip controller (OCC), http://openpowerfoundation.org/blogs/on- chip-controller-occ/.

[28]   Rotem, Power management architecture of the 2nd generation Intel CoreTM microarchitecture, formerly codenamed Sandy Bridge, in: Proceedings of the 23rd Hot Chips Symposium, 2011.

[29]  T. Rosedahl, C. Lefurgy, POWER8 on chip controller: measuring and managing power consumption, http://hpm.ornl.gov/documents/HPM2015:Rosedahl.pdf.

[30]   Bellosa, The benefits of event-driven energy accounting in power-sensitive systems, in: Proceedings of the Ninth Workshop on ACM SIGOPS European Workshop (EW 200), 2000, pp. 37–42.

[31]   Contreras, M. Martonosi, Power prediction for Intel XScale® processors using performance monitoring unit events, in: Proceedings of the 10th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED 2005), 2005, pp. 221–226.

[32]   Isci, M. Martonosi, Runtime power monitoring in high-end processors: methodology and empirical data, in: Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2003), 2003, pp. 93–104.

[33]   Joseph, M. Martonosi, Run-time power estimation in high performance microprocessors, in: Proceedings of the Sixth ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED 2001), 2001, pp. 135–140.

[34]  B. Elkin, V. Indukuru, Commonly used metrics for performance analysis—POWER7, https://www.power.org/documentation/commonly-used-metrics-for-performance-analy sis/.

[35]  T. Willhalm, R. Dementiev, P. Fay, Intel performance counter monitor—a better way to measure CPU utilization, http://www.intel.com/software/pcm.

[36]  ARM Ltd., Using the performance monitoringunit (PMU) and the event counters in DS- 5, http://ds.arm.com/developer-resources/tutorials/using-the-performance-monitoring- unit-pmu-event-counters-in-ds-5/.

[37]  Vega,A.Buyuktosunoglu,H.Hanson,P.Bose,S.Ramani,Crankitupordialitdown: coordinated multiprocessor frequency and folding control, in: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2013), Davis, California, 2013, pp. 210–221.

[38]  A.Mericas,B.Elkin,V.R.Indukuru,ComprehensivePMUeventreference—POWER7, http://www.power.org/documentation/comprehensive-pmu-event-reference-power7/.

[39]   Floyd, M. Allen-Ware, K. Rajamani, B. Brock, C. Lefurgy, A.J. Drake, L. Pesantez, T. Gloekler, J.A. Tierno, P. Bose, A. Buyuktosunoglu, Introducing the adaptive energy management features of the POWER7 chip, IEEE Micro 31 (2) (2011) 60–75.

[40]   Brooks, V. Tiwari, M. Martonosi, Wattch: a framework for architectural-level power analysis and optimizations, in: Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA 2000), 2000, pp. 83–94.

[41]   Li, J.H. Ahn, R.D. Strong, J.B. Brockman, D.M. Tullsen, N.P. Jouppi, McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures, in: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009), 2009, pp. 469–480.

[42]   Brooks, P. Bose, V. Srinivasan, M.K. Gschwind, P.G. Emma, M.G. Rosenfield, New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors, IBM J. Res. Dev. 47 (5–6) (2003) 653–670.

[43]  H. Gunther, F. Binns, D.M. Carmean, J.C. Hall, Managing the impact of increasing microprocessor power consumption, Intel Technol. J. 5 (1) (2001).

[44]  Brooks,P.Bose,M.Martonosi, Power-performance simulation: design and validation strategies, ACM SIGMETRICS Perform. Eval. Rev. 31 (4) (2004) 13–18.

[45]  Liang,K.Turgay,D.Brooks, Architectural power models for SRAM and CAM structures based on hybrid analytical/empirical techniques, in: Proceedings of the 2007 IEEE/ ACM International Conference on Computer-Aided Design (ICCAD 2007), 2007, pp. 824–830.

[46]  W. Harwood, Slow, but rugged, Curiosity’s computer was built for Mars, 2012 http:// www.cnet.com/news/slow-but-rugged-curiositys-computer-was-built-for-mars/.

Reprinted with permission from Elsevier/Morgan Kaufmann, Copyright © 2016

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.