Tutorial: Improving the transient immunity of your microcontroller-based embedded design - Part 5In many instances, the way embedded software is structured and how it interacts with the hardware in a system can have a profound effect on the transient immunity performance of a system.
It can be impractical and costly to completely eliminate transients at the hardware level, so the system and software designers should plan for the occasional erroneous signal or power glitch that could cause the software to perform erratically. Erratic actions on the part of the software can be classified into two different categories:
1. The microcontroller (MCU) sees changes in input signals, or on-chip hardware reacts to input signals that are out of the ordinary, and the software does not have the "intelligence" to ignore them or deal with them in a safe and appropriate manner.
2. Software runaway occurs when the disturbance is so significant that the software code execution flow is disrupted and the central processing unit (CPU) begins to execute code out of sequence or from incorrect areas of memory.
We refer to the approach to software design that addresses these problems as "Defensive Software Design." The following is a list of some common and more effective techniques used in defensive software design. Keep in mind that this is not an exhaustive list of possible techniques but are those commonly used in the development of robust embedded software.
Digital Input Pins
This concern is particularly important when input pins are vulnerable in the system. Generally, noise glitches in the system last for a duration measured in the 10's to low 100's of nanoseconds. Using a simple software filtering technique such as a majority vote or polling will allow the system to ignore these glitches.
|Figure 1. Example of Majority Vote|
In the majority vote technique shown in Figure 1 above, the input is read a predetermined number of times and the logic state that is read a majority of the time is considered the proper state.
In the polling technique, when a pin state change is detected, the pin is sampled several more times over a predetermined time interval to make sure the pin remains in that state. This insures that the state change is of sufficient duration to be considered a valid change and not a glitch. This is a particularly useful technique to use on interrupt inputs such as an interrupt request (IRQ) pin or keyboard interrupt (KBI) pins. It is often used when de-bouncing mechanical switch inputs.
Digital Outputs & Crucial
Within the main system software loop, frequently update outputs and other critical registers that control output pins. These include:
* Data direction registers
* I/O modules that can be modified by software
* Random access memory (RAM) registers that are used for vital pieces of the application
This ensures that any minor malfunction will be corrected without a major upset. The refresh of these registers should be as regular as possible. Reliability of outputs and RAM registers should not be affected with constant writing/updating. Care should be taken to ensure that functions like serial communications and timers are in an inactive state when they are reinitialized because some status bits are affected by a write to the corresponding control register.
Boundary checking refers to a method of validating the input signals to the MCU. A type of sanity check on input signals to make sure they are not obviously in error. This technique will have to be used in cases where input signals (either digital or analog) cannot be "filtered" as in the majority voting scheme discussed above.
For example, an electrical glitch reaches the input capture pin to a timer block. The Timer count value that is captured can be examined to see if it relatively close in value to what is expected. As a further example, suppose that the Timer Input Capture is being used to measure the duration of an input pulse. If the input signal can have an input period range of say 1-10 ms then a measured time period of 5 us should be considered "bad data" and dealt with appropriately.
Oscillator and Other Sensitive
The most vulnerable pins on an MCU are usually the high impedance analog pins such as those used in oscillator circuits, a phase locked loop (PLL), and analog signal inputs. In general, software cannot correct for a pin that is poorly protected by the hardware in this case. Special care in board layout and design must be the focus with these types of pins.
However, filtering techniques similar to those discussed above for digital pins can be applied to some analog signal input pins such as those that feed an analog to digital converter (ADC). In this case, the converted values can be analyzed to determine if the values are within expected boundaries and by performing simple averaging on all valid conversions most noise effects can be diminished.
Stuctured code techniques
Structured code techniques should be used at all times. When passing control from the main loop to a subroutine (or procedure), always pass control with token bytes in RAM that the subroutine can check. Once the return from subroutine occurs, the token bytes should be cleared or changed to the next value. This prevents the code in these routines from being called and executed accidentally by runaway code. A simplified example is shown in Figures 2 and 3, below
|Figure 2.Token Passing, Main Loop|
|Figure 3.Token Passing, Subroutine|
Filling Unused Memory
If the CPU ever runs away, a convenient way to help get it back on track is to fill unused memory with a single byte instruction. Use either a SWI (software Interrupt) or a NOP (no operation) instruction followed by an occasional JSR Start instruction as shown in Figure 4 and Figure 5, below, respectively.
|Figure 4. Memory Fill with SWI|
|Figure 5. Memory Fill with NOP|
This technique is best implemented so that the SWI interrupt routine can determine if (through Token passing or other means) the interrupt was called intentionally or not and take the appropriate action. Most linkers or programmers have options that can be used to fill unused memory blocks with the same data.
Unused Interrupt vectors
Define all interrupt vectors, even those that are not used. Vectors from unused MCU functions should be pointed to a safe routine that could indicate an error condition if executed.
Hardware protection features are included in the MCU to improve system stability and reliability. They help gracefully recover the system in the event of a significant electrical disturbance that is too severe for the hardware defense mechanisms and which may result in a code runaway condition. In general, try to use every protection feature available in the MCU of choice.
Before listing good design practices, it is also imperative that the MCU initialization code insures that these features are turned on. In some MCU's there will be an enable bit in a Configuration Register which controls the most of these features. Because these bits are often "write once" bits, it is necessary for the software to write these control bits even if the default states are not changed. This will prevent a run away code situation from unintentionally turning the protection mechanism features off.
COP (Computer Operating Properly)
The following is a list of good practices that should be followed to insure that the COP hardware will recover a runaway code situation quickly and reliably. The scope and spirit of these guidelines is to minimize the likelihood that a random set of conditions could service the COP. They include:
1) Use the shortest COP timeout period possible to insure that a runaway condition will not last very long. The nature of the application will dictate the actual COP timeout period chosen.
2) Avoid placing COP refreshes in interrupt routines. Interrupts can be serviced even if the CPU is stuck in an unknown loop within the main program.
3) Ideally use one COP refresh operation within the main loop.
4) If main loop period is greater than COP timeout - refreshes should be placed at equal intervals of 80% of COP timeout period.
5) If main loop period is much less than COP timeout - introduce a s/w count which will only refresh at approx. 80% of COP timeout period.
6) Any loops servicing the COP should timeout within a finite amount of time. The time will depend on how long the system can tolerate the CPU executing code incorrectly.
7) No loop servicing the COP should have a jump instruction from the bottom of the loop to the top of the loop unless it is based on multiple conditions, not just a single CPU instruction.
8) The decision to service or not service the COP should be based on multiple conditions, not just one. For example, do not base the decision on a single CPU condition code bit or on a single status register bit.
9) Memory should be examined to insure that it does not contain a string of bytes that, if executed, would feed the COP unintentionally. For example, a data table inside an HC08 MCU device may reside somewhere in memory that contains the following string of data imbedded within it:
.$C7, $FF, $FF, ..
If the CPU gets lost and tries to execute an instruction from the location with the $C7 in it, it would perform a write to location $FFFF which would feed the COP in an HC08 MCU.
10) Servicing the COP should be based on a set of conditions extremely unlikely to randomly occur. Do not make decisions to service the COP based on a single bit or byte in RAM or a single status register bit. Check the system state for integrity before servicing the COP.
Illegal Instruction & Illegal
Use these features if they exist on the chosen MCU. It's potentially a way of quickly recovering the system during a runaway code condition. Along with these interrupt or reset events, many MCU's will have a Reset Status Register and an Interrupt Status Register that may be helpful in determining the source of the Reset or Interrupt so that the software can take the appropriate action.
An Illegal Address Reset is most effective on MCU's with smaller amounts of memory as the likelihood of the runaway code landing in an unimplemented section of the memory map increases.
Low Voltage Detect (LVD) Circuits
If the chosen MCU contains an integrated power supply monitoring circuit which will reset the MCU in the event of sudden power loss, then this feature can be used to protect the MCU from going into a code runaway condition. However, keep in mind that the LVD circuit may not detect a very fast loss and recovery of the supply voltage since the response time of these circuits are often intentionally slow.
One last tip:
If possible, architect software so that MCU resets can be tolerated. Systems with good feedback and system integrity checking are more deterministic and can recover from a spurious system Reset much easier.
To read Part
1, go to: Defining
To read Part 2, go to: Hardware Techniques - The basic circuit building blocks
To read Part 3, go to: System power and signal entry considerations
To read Part 4, go to PCB Power Supply and Floor Plan Opportunities
Carlton has specialized in all aspects of electromagnetic compatibility
(EMC) since his graduation from Texas A&M University with a
Bachelor of Science in Electrical Engineering in 1985. He has been with
Semiconductor for the last eight years where he has led the EMC
design, test and support of Freescale's 8, 16, and 32-bit
microcontroller products. In addition, Ross represents the U.S. as a
Technical Expert to IEC Subcommittee 47A on integrated
circuits where he is the project leader for IEC 61967-2, IEC 61967-3
and IEC 62132-2. He is
currently involved in developing transient immunity test methodologies
The author would like to thank Greg Racino and John Suchyta, 8-Bit Applications Engineer at Freescale Semiconductor for their inputs and guidance. Their contributions were critical to ensuring consistent and correct guidance.
1. Ross Carlton, Greg Racino, John Suchyta, Improving the Transient Immunity Performance of Microcontroller-based applications. Freescale Application Note (AN) 2764).
2. IEC 61000-4-2, Electromagnetic compatibility (EMC) - Part
4-2: Testing and measurement techniques - Electrostatic discharge
immunity test, International
Electrotechnical Commission, 2001.
3. IEC 61000-4-4, Electromagnetic Compatibility (EMC) - Part 4-4: Testing and measurement techniques - Electrical fast transient/burst immunity test, International Electrotechnical Commission, 2001.
4. Ronald B. Standler, Protection of Electronic Circuits from Overvoltages, John Wiley & Sons, 1989, pp. 265-283.
5. Ken Kundert, "Power Supply Noise Reduction", The Designer's Guide , 2004.
6. Larry D. Smith, "Decoupling Capacitor Calculations for CMOS Circuits", Electrical Performance of Electrical Packages Conference, Monterey CA, November 1994, Pages 101-105.
7. Ronald B. Standler, Protection of Electronic Circuits from Overvoltages, John Wiley & Sons, 1989.
8. Clayton Paul, Introduction to Electromagnetic Compatibility, Wiley & Sons, 1992.
9. Bernard Keiser, Principles of Electromagnetic Compatibility, Artech House, 1987.10. T.C. Lun, "Designing for Board Level Electromagnetic Compatibility", Motorola Application Note (AN) 2321