Design next-generation platforms while adhering to thermal-management standards - Embedded.com

Design next-generation platforms while adhering to thermal-management standards

As processor frequencies rise, it gets more difficult to keep systems cool. Employing the PECI and a DTS will put you on the road to “coolness.”

With the recent availability of Intel microprocessors that contain a Platform Environmental Control Interface (PECI) and a Digital Thermal Sensor (DTS), as well as ICs that can maximize these capabilities, designers must understand what these technologies are and how they can improve a product's performance. Although initially targeting servers and desktop applications, PECI and DTS will quickly spread to other products, especially notebooks PCs and eventually consumer electronics, when users recognize the thermal and acoustic noise improvements they offer.

Increased computing performance has always meant increased thermal issues. More recently, fan noise has increased as hotter systems struggle to stay within safe operating temperatures based on older thermal strategies.

Until recently, CPU temperatures were measured by a substrate diode that was essentially a parasitic PNP transistor with its collector tied to ground through the substrate. Once it was converted to a digital format by an external sensor circuit and the System Management Bus (SMBus), the SMBus communicated the substrate diode's digital reading. However, with shrinking geometries, the diode's accuracy was continually degrading.

At the same time, the SMBus had its limitations. To solve many of the SMBus problems, the Simple Serial Transport (SST) network was developed to communicate at a higher speed in a network-type architecture for greater flexibility. SST is a single-wire serial communications protocol developed specifically for system thermal management. It operates at 1.5 V and has a transfer rate of up to 2 Mbits/s. In addition to thermal readings, it can communicate voltage measurements.

As a subset of the SST protocol, Intel's proprietary PECI is specifically aimed at reporting CPU temperatures. The interface does not communicate voltage information. To ensure accurate data over a bandwidth that ranges from 2 kbits/s to 2 Mbits/s, PECI uses a cyclical-redundancy-check (CRC) byte for error checking. Instead of a substrate diode, the temperature measurements communicated by PECI are made by a DTS. The on-die DTS, along with an analog-to-digital temperature converter, provides improved temperature data.

One of the first applications of the PECI protocol and the DTS is the 64-bit dual-core Intel Xeon 5100 series processor introduced in last June. The initial implementation of PECI interfaces the CPU to the I/O controller hub. However, the PECI host controller could be embedded either in the I/O controller hub or in an external fan-speed controller (FSC). In either case, the PECI bus responds to commands to read temperatures.

PECI requires two pins, one for the communication and one for the voltage. PECI voltages are specified in terms of the voltage provided to the processor to initiate power up and drive I/O buffer circuits (VTT ). As a result, the input voltage (VIN ) has a maximum value of VTT + 0.15 V and a minimum value of -0.15 V. While the negative-edge threshold voltage range is between 0.500 × VTT (max) and 0.275 × VTT (min), the positive-edge threshold voltage range is between 0.725 × VTT (max) and 0.550 × VTT (min). As shown in Figure 1, the minimum hysteresis value is 0.1 × VTT .

View the full-size image

Since PECI only reports temperatures, it requires minimal overhead from the CPU side and a minimum of additional circuitry to communicate this data. The thermal data comes from on-die measurements made by the DTSs that are in digital format based on an analog-to-digital temperature conversion performed on-chip. Factory calibration provides improved accuracy for the DTS compared with the on-die thermal diode. Information from the DTS is processed and stored in a CPU register that may be accessed by PECI or directly through the BIOS.

The location of the DTS is closer to the actual hot spots on the CPU die than the previously employed temperature detector. This is another factor that leads to improved thermal accuracy because a 40° C or higher gradient can exist from the hot spot to the coolest portion of the die. However, the measured temperature is always a compromise from the actual highest temperature and depends on how rapidly the temperature changes. Cooling from the system fans alters the hot spot(s). To account for system to system variations, the DTS sample interval range, which is 82 μs in the default mode, can be modified to a 20-ms maximum. Also, a data filtering algorithm provides additional system flexibility.

Applying PECI and DTS
Temperatures from the DTS are communicated by PECI as negative values relative to the CPU's thermal control circuit (TCC) setting. When this setting is reached, the processor will take action to reduce its own temperature. Typical actions include lowering the core voltage, commanding the voltage regulator to lower the core voltage, or reducing the clock speed. These fail-safe measures are implemented to reduce the amount of heat that the processor generates. As shown in Figure 2, prior to activating the TCC temperature, the Tcontrol setting is reached, which is also a negative value.

View the full-size image

Based on the PECI temperature input, the fan speed increases from its minimum to maximum levels as shown in Figure 3. At the Tcontrol setting, the temperature would be around 10° C lower than the TCC activation temperature. With the fan operating at 100% and the temperature increasing, the fail-safe efforts kick in.

View the full-size image

The use of negative temperature values makes it easier to handle processor-to-processor variations in maximum operating temperature, because the FSC is always trying to stay within a set value below that temperature. With every CPU family having a specific TCC and a heat sink that provides a specified amount of thermal resistance between the case and the ambient, the relative rather than absolute temperature readings simplify dealing with the variables.

An FSC IC with a PECI host can take advantage of both the PECI interface and the improved DTS measuring capability. With PECI providing the critical CPU temperature input in one zone, as well as inputs from two remote thermal diode type sensors and an on-chip temperature sensor, a hardware monitor (HWM) function and integrated fan control (IFC) can control up to four fans. Providing a pulse width modulation (PWM) control and tachometer input for each fan, the IC cools four independent zones for complete motherboard thermal management. The measured temperatures may be assigned individually or combined in one zone.

To reduce audible fan noise, the IC also has an automatic fan speed algorithm, voltage monitors, limit checks, and alarm status for each measurement. With the input from the DTS, problems with noise interference, non-ideal operation, and series resistance that affected the thermal diode no longer exist. The IC takes PECI input for the CPU's temperature and stores the data in a register.

To provide greater system control, the temperature of a graphics chip set and an ambient on the pcb or in the air stream can be measured by the two remote diodes that could be the input from a common 3904 transistor with the collector shorted to the emitter. By comparing measured values to the limit and status registers, the IC can alert the system host if any measurement is outside of its programmed limits. Using internal scaling resistors, the chip monitors VCCP , 2.5-, 3.3-, 5.0-, and 12-V motherboard/processor voltages.

Two operating modes, auto fan and maximum PWM, provide optimum fan speed control. As shown in Figure 4, in the auto fan mode, fan temp limit, range, fan PWM, and minimum or off for PWM can all be programmed by the user. The PWM duty cycle increases or decreases linearly above the fan temp limit with increasing or decreasing temperature. In the example in Figure 4, the fan will run at 100% duty cycle when the zone sensor's temperature reaches 58° C. When the temperature of any zone exceeds the absolute limit, all PWM outputs are set at 100 % duty cycle for maximum cooling.

View the full-size image

To minimize acoustic noise, the maximum PWM mode is used to limit the fan's speed under normal conditions. In this mode, the absolute limit, the fan temp limit plus range, and the selection of which one is higher than the other are all user-programmable functions. If the fan temp limit plus range is the higher setting, as soon as the programmable high limit is reached, the PWM rate jumps to 100% duty cycle. Similar to the auto fan mode, with increasing or decreasing temperature above the fan temp limit, the PWM duty cycle increases or decreases linearly. Both the auto fan and maximum PWM modes ensure safe cooling and lower acoustic noise levels than previous architectures, including SMBus control.

Dave Pivin is a product applications manager at Andigilog Inc. Coming off an 18-year career at Motorola's Semiconductor Products Sector as an applications engineer and technical marketing manager for the ASIC product line, Pivin Andigilog in 2003. He's authored many articles for leading trade journals and has presented papers at industry conferences. Pivin, who holds an MS in engineering management from Northeastern University and a BSEE from UC Irvine, can be reached at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.