Reliable and power-aware architectures: Measurement and modeling
Editor's Note: Embedded designers must contend with a host of challenges in creating systems for harsh environments. Harsh environments present unique characteristics not only in terms of temperature extremes but also in areas including availability, security, very limited power budget, and more. In Rugged Embedded Systems, the authors present a series of papers by experts in each of the areas that can present unusually demanding requirements. In Chapter 2 of the book, the authors address fundamental concerns in reliability and system resiliency. This series excerpts that chapter in a series of installments including:
- Reliable and power-aware architectures: Sustaining system resiliency
- Reliable and power-aware architectures: Measuring resiliency
- Reliable and power-aware architectures: Soft-error vulnerabilities
- Reliable and power-aware architectures: Microbenchmark generation
- Reliable and power-aware architectures: Measurement and modeling (this article)
Elsevier is offering this and other engineering books at a 30% discount. To use this discount, click here and use code ENGIN317 during checkout.
Adapted from Rugged Embedded Systems, Computing in Harsh Environments, by Augusto Vega. Pradip Bose, Alper Buyuktosunoglu.
CHAPTER 2. Reliable and power-aware architectures: Fundamentals and modeling (Continued)
8 POWER AND PERFORMANCE MEASUREMENT AND MODELING
8.1 IN-BAND VERSUS OUT-OF-BAND DATA COLLECTION
From a power and performance standpoint, a system’s status can be described by a variety of sensors and/or performance counters that usually provide power consumption, temperature, and performance information of the different system components (CPUs, memory and storage modules, network, etc.). There are two approaches to access such information on line: it can be done either in-band or out-of-band. We say that a system is sensed in-band when the collected information is generated by the same component that the information refers to; like, for example, CPU power consumption or thermal information generated internally by the CPU itself in some specific way. This is the case of the on-chip controller (OCC) in POWER-based systems —a co-processor embedded directly on the main processor die—as well as Intel’s Power Control Unit (PCU) and associated on-board power meter capabilities . On the other hand, if the collected information is obtained in a way that does not involve the component under measurement, then we say that system information is gathered out-of-band. This may be the case of CPU power consumption estimated by an external process (running on the same system or even on a different one) using some “proxy” CPU information (like performance counters). IBM’s Automated Measurement of Systems for Energy and Temperature Reporting (AMESTER) is an example of out-of-band system measurement . AMESTER constitutes an interface to remotely monitor POWER-based systems through the service processor available in those systems. The service processor (which is different from the main system processor) accesses a variety of power, temperature, and other sensors that monitor the entire system status and deliver them to the remote AMESTER client.
In practice, resorting to in-band or out-of-band system measurement will depend on (1) the available data collection capabilities of the system in question, (2) the performance perturbation, and (3) the expected information granularity and correlation with the actual execution. For example, on-chip power metering capabilities are relatively novel and may not be widely available, in which case chip power consumption could be estimated through the use of proxy information—for example, performance counters to gauge processor activity and derive corresponding power figures. Even though in-band measurement may look more appealing over out-of-band approaches, it may also perturb system performance or interfere with the running workloads since the component being measured has to “produce” the requested information in some way. Finally, the correlation (or lack thereof) of the collected data and the actual measured events should be taken into account. Out-of-band measurement techniques usually perform poorer in terms of correlating the measurements with actual events with precision, in particular when the measurement is conducted remotely.
Regardless if a system is measured in-band or out-of-band, processor performance counters constitute a key source of information when it comes to characterize the run-time behavior of a CPU from performance, power, and thermal perspectives. Hence they are vastly used to model power consumption and temperature when power and temperature measurement from sensors is not possible. The critical part in this process is, however, the selection of the right set of counters that correlates with real values with enough accuracy—a problem extensively studied by several researchers and groups, like for example [30–33].
8.2 PROCESSOR PERFORMANCE COUNTERS
Processor performance counters are hardware counters built into a processor to “count” certain events that take place at the CPU level, like the number of cycles and instructions that a program executed, its associated cache misses, accesses to off-chip memory, among several other things. The events that can be tracked with performance counters are usually simple ones, but combined in the right way can provide extremely useful insights of a programs behavior and, therefore, constitute a valuable tool for debugging purposes. As it is explained by Elkin and Indukuru in , the first step in optimizing an application is characterizing how well the application runs on the target system. The fundamental intensive metric used to characterize the performance of any given workload is cycles per instruction (CPI)—the average number of clock cycles (or fractions of a cycle) needed to complete an instruction. The CPI is a measure of processor performance: the lower it is, the more effectively the processor hardware is being kept busy. Elkin and Indukuru also provide a detailed description of the performance monitoring unit (PMU) built into IBM’s POWER7 systems, which comprises of six thread-level performance monitor counters (PMCi). PMC1 to PMC4 are programmable, PMC5 counts nonidle completed instructions and PMC6 counts non idle cycles. Other processor manufacturers offer similar counting capabilities, like Intel’s Performance Counter Monitor  and ARM’s Performance Monitoring Unit .
Processor performance counters can also be effectively used to drive run-time power and thermal management decisions in the CPU. As an example, the Power-Aware Management of Processor Actuators algorithm (PAMPA) proposed by Vega et al.  aims to provide robust chip-level power management by coordinating the operation of dynamic voltage and frequency scaling (DVFS), core folding and per-core power gating (PCPG) using CPU utilization information derived from a few performance counters. More specifically, PAMPA collects on-line hardware events information from the PM_RUN_CYC performance counter available in POWER7+ systems . PM_RUN_CYC counts the nonidle processor cycles at physical thread level. In other words, it filters out those processor cycles during which a particular physical thread is idle and, therefore, constitutes a suitable proxy to estimate physical thread-level utilization.