Embedding an IPMI platform management subsystem to monitor server system health

James Edwards

September 14, 2010

James Edwards

At the local level, meaning the system administrator evaluates the system with his eyes and feet, the platform management subsystem can report status using indicators such as LEDs, small chassis-mounted display, through an I/O port (USB or serial for example), or a combination of these (Figure 2 below).  

Assuming there are several systems in the server room, the most effective signal would be a bright LED, which when lit indicates that the system has a failure or is about to have one. The system administrator could then get the error condition by reading the small chassis display or connecting a laptop computer to the platform management subsystem’s status/error I/O port.  

 

Figure 2 – Unit-observable Platform Management

The hardware implementation of this level can easily fit into a field-programmable gate-array (FPGA). A microcontroller could be used as the platform management subsystem brains, but it is not a requirement because state-machine driven hardware is capable of controlling the platform management function. If the platform management hardware is implemented in an FPGA, the hardware can be easily modified and can even be field-updated. AMD’s latest embedded server reference designs have an FPGA for control, and the source code is available as well.  

Remote – Limited-observable – Medium to Large Installations

The platform management system can send system status and alarms to a remote location via an ethernet connection (Figure 3 below). With this type of connection, the system administrator can observe the platform status from anywhere in the world.  

The Intelligent Platform Management Interface (IPMI) is an open-source specification that describes the structure and format of the interfaces necessary to enable these platform management services. It does not specify a particular solution. With a platform management solution compatible with the IPMI standard, system health can be monitored; if something has failed or is about to fail, an alarm and/or system status can be observed remotely by the system administrator. If the system needs maintenance, such as a fan replacement, it can be scheduled before the failure actually occurs.

 

Figure 3 – Limited-observable Platform Management  

The platform management solution can also assert system reset and power on/off. This means the system can be powered on or off and/or reset remotely, or if a severe failure occurs, the platform management solution can power off the system automatically and report the failure to the system administrator.

This type of sophistication requires that the platform management system be controlled by a microcontroller.  

Remote – Highly observable – Generally Large Installations

In addition to the capabilities of the “remote – limited-observable” features, the “remote – highly-observable platform management” solution provides remote control of the keyboard and mouse and remote visibility to the display contents (Figure 4 below).  

This level of observation and control is made possible through a feature called keyboard, video, and mouse over internet protocol (KVMIP). The system administrator sees exactly what is being displayed from the system.  

The system’s display output is captured by the platform management hardware, converted into IP packets, and sent to the system administrator’s system where the IP packets are reassembled into the display output for the system administrator to view. The same is done with the keyboard and mouse input, except in the other direction.

 

Figure 4 – Highly-observable Platform Management

To facilitate the development of this type of platform management, AMD in partnership with other companies has created an open, royalty-free connector and pinout standard to allow third-party developers to create sophisticated platform management solutions as standard products.  

The standard is called Open Platform Management Architecture (OPMA). OPMA leverages the IPMI specification to provide for the basic platform management solution and adds the KVMIP capability to achieve the premium platform management solution that the large enterprise server installations demand.

Conclusion

Platform management is vital for enterprise-class systems. There are three sophistication levels to platform management design. Each level can have the same degree of monitoring hardware. It is the difference in reporting capability that distinguishes each level.  

Platform management’s ability to improve system operational ratio is critical to the reliable operation of large server farms and individual installations. It also lowers TCO by automating failure reporting. Fewer spare systems are needed because the systems in use are up more of the time.

James Edwards is Senior Technical Marketing Manager for the AMD Embedded Solutions group  in Ft. Collins, Colorado.  James previously worked at Compaq for 11 years where he was responsible for the main system board design for several portable computers as well as with Cyrix and then National Semiconductor where he was the Geode processor (X86 compatible) processor.

< Previous
Page 2 of 2
Next >

Loading comments...

Most Commented

  • Currently no items

Parts Search Datasheets.com

KNOWLEDGE CENTER