Integrating functional safety into a complex electronic system can be daunting to designers. Recent advances in embedded-processor architecture, however, have made this task readily attainable and at lower cost.
To understand why functional safety standards dictate numerous system aspects, it helps to know the types of failures to which embedded systems are susceptible. In general, failures fall into two main categories: systematic and random.
Systematic failures usually result from problems with the chip design, software bugs or the manufacturing process. Continuous process improvements often repair them. An example of a systematic failure in an electronic system is a suboptimal solder reflow profile used in printed-circuit board assembly that results in circuit-continuity failures.
Random failures may be more difficult to fix, because they often result from chance defects or events that are inherent to a process, a usage condition or the operating environment. An example of a random failure in an electronic system is an embedded-processor malfunction caused by an alpha or neutron particle bombarding a RAM bit, getting it to flip state. It is almost impossible to reduce the rate of random failures, but use of risk-mitigation measures can help detect them and respond appropriately when they occur.
At the design stage, safety-critical architectures have helped electronic systems to withstand both systematic and random failures. The three architectures now used most often are the one-out-of-two system (1oo2), the two-out-of-two system (2oo2) and the two-out-of-three system (2oo3).
The 1oo2 system is usually implemented using two embedded processors with independent input/output (I/O) in a configuration where both controllers must command an output for activation to occur. In this architecture, it takes a failure in both systems for an inadvertent activation to happen.
Like the 1oo2 system, the 2oo2 system has two embedded processors with independent I/O. In this configuration, however, the output circuit is configured in a manner in which a failure in both systems must occur for an inadvertent deactivation. Both of these systems are usually found in industrial-control environments, where inadvertent activation or deactivation of an actuator could be dangerous.
The 2oo3 system is designed with three embedded processors and a complex output voting circuit. When a fault occurs in one of the three controllers, the output of the other two is used to control the system. A 2oo3 system is usually used in fail-operation applications, where the system must continue functioning despite a failure—most often, flight-critical aircraft systems and life-support medical devices.
Figure 2. The 2oo3 system is designed with three embedded processors and a complex output voting circuit.
Click on image to enlarge.
But using these safety-critical architectures takes a tremendous amount of development time and effort; not only does the entire embedded processor need to be duplicated, but sophisticated software-safety algorithms must be implemented. In addition, these architectures increase the systems’ susceptibility to random failures. The amount of logic that is susceptible to alpha- and neutron-particle strikes increases significantly as the number of system processors grows.
Enhancements to embedded processors have emerged to combat the shortcomings of traditional safety systems. Many embedded processors, such as the Hercules RM4x and TMS570 microcontroller families from Texas Instruments, are now available with integrated embedded-hardware diagnostics to address a multitude of functional safety concerns. These processors apply continuously operating hardware-based safety mechanisms on such components as the CPU, flash memory, SRAM, power and clocks to ensure accurate software execution.
The CPU’s complexity makes it a prime candidate for a dual-core lockstep safety mechanism. A compare module confirms that the outputs of the two cores are identical on a cycle-by-cycle basis. To address the integrity of both the embedded flash memory and the SRAM, many controllers incorporate error-correcting code (ECC) that detects corruption and corrects single-bit errors so system operation can continue uninterrupted. Embedded processors also have incorporated built-in self-test (BIST) engines that provide robust diagnostic testing on the CPU and memories even when the system is not running code.
Combining integrated safety features into a single IC has led to streamlined safety architectures, including the one-out-of-one-with-diagnostics (1oo1D) system. This type of safety architecture suits a wide variety of fail-safe systems where the failure rate must be extremely low. In addition, designers of fail-operational systems are working with safety-enabled processors and with the two-out-of-two-with-diagnostics (2oo2D) architecture, which is simpler and more cost efficient than the 2oo3.
Because of their diagnostic capabilities and cost optimization, safety embedded processors are going into systems that don’t necessarily require functional safety but do require high levels of availability. Manufacturers of central-office communications and data-center equipment are taking advantage of safety embedded processors to mitigate the risk of downtime. Though a failure in one of these devices doesn’t usually pose an imminent danger to human life or the environment, they do need to be extremely robust and resistant to all types of failures: A failure in a major communications backbone can affect millions of people and lose significant revenue for the communications provider. The additional cost and time needed to develop a 1oo1D system is marginal compared with the cost of system downtime.
The advent of safety embedded processors is helping to decrease the cost, complexity and development time of safety-critical systems. System designers who utilize integrated-hardware safety features can substantially reduce safety software development time and the number of components needed for needed functional safety and reliability.
About the author
Anthony Vaughan is the North America marketing and business development manager for Texas Instruments’ Hercules safety microcontroller group. He joined TI as a product engineer in the imaging and audio group, then became an applications engineer with TI’s automotive and safety microcontroller group. Vaughan holds a BSEE from Texas A&M.