Tutorial: Improving the transient immunity of your microcontroller-based embedded design - Part 5 - Embedded.com

Tutorial: Improving the transient immunity of your microcontroller-based embedded design – Part 5

In many instances, the way embedded software is structured and how itinteracts with the hardware in a system can have a profound effect onthe transient immunity performance of a system.

It can be impractical and costly to completely eliminate transientsat the hardware level, so the system and software designers should planfor the occasional erroneous signal or power glitch that could causethe software to perform erratically. Erratic actions on the part of thesoftware can be classified into two different categories:

1. The microcontroller (MCU)sees changes in input signals, or on-chip hardware reacts to inputsignals that are out of the ordinary, and the software does not havethe “intelligence” to ignore them or deal with them in a safe andappropriate manner.

2. Software runaway occurswhen the disturbance is so significant that the software code executionflow is disrupted and the central processing unit (CPU) begins toexecute code out of sequence or from incorrect areas of memory.

We refer to the approach to software design that addresses theseproblems as “Defensive Software Design.” The following is a list ofsome common and more effective techniques used in defensive softwaredesign. Keep in mind that this is not an exhaustive list of possibletechniques but are those commonly used in the development of robustembedded software.

Digital Input Pins
This concern is particularly important when input pins are vulnerablein the system. Generally, noise glitches in the system last for aduration measured in the 10's to low 100's of nanoseconds. Using asimple software filtering technique such as a majority vote or pollingwill allow the system to ignore these glitches.

Figure 1. Example of Majority Vote

In the majority vote technique shown in Figure 1 above, the input isread a predetermined number of times and the logic state that is read amajority of the time is considered the proper state.

In the polling technique, when a pin state change is detected, thepin is sampled several more times over a predetermined time interval tomake sure the pin remains in that state. This insures that the statechange is of sufficient duration to be considered a valid change andnot a glitch. This is a particularly useful technique to use oninterrupt inputs such as an interrupt request (IRQ) pin or keyboardinterrupt (KBI) pins. It is often used when de-bouncing mechanicalswitch inputs.

Digital Outputs & CrucialRegisters
Within the main system software loop, frequently update outputs andother critical registers that control output pins. These include:

* Data direction registers
* I/O modules that can bemodified by software
* Random access memory (RAM)registers that are used for vital pieces of the application

This ensures that any minor malfunction will be corrected without amajor upset. The refresh of these registers should be as regular aspossible. Reliability of outputs and RAM registers should not beaffected with constant writing/updating. Care should be taken to ensurethat functions like serial communications and timers are in an inactivestate when they are reinitialized because some status bits are affectedby a write to the corresponding control register.

Boundary Checking
Boundary checking refers to a method of validating the input signals tothe MCU. A type of sanity check on input signals to make sure they arenot obviously in error. This technique will have to be used in caseswhere input signals (either digital or analog) cannot be “filtered” asin the majority voting scheme discussed above.

For example, an electrical glitch reaches the input capture pin to atimer block. The Timer count value that is captured can be examined tosee if it relatively close in value to what is expected. As a furtherexample, suppose that the Timer Input Capture is being used to measurethe duration of an input pulse. If the input signal can have an inputperiod range of say 1-10 ms then a measured time period of 5 us shouldbe considered “bad data” and dealt with appropriately.

Oscillator and Other SensitiveAnalog Pins
The most vulnerable pins on an MCU are usually the high impedanceanalog pins such as those used in oscillator circuits, a phase lockedloop (PLL), and analog signal inputs. In general, software cannotcorrect for a pin that is poorly protected by the hardware in thiscase. Special care in board layout and design must be the focus withthese types of pins.

However, filtering techniques similar to those discussed above fordigital pins can be applied to some analog signal input pins such asthose that feed an analog to digital converter (ADC). In this case, theconverted values can be analyzed to determine if the values are withinexpected boundaries and by performing simple averaging on all validconversions most noise effects can be diminished.

Stuctured code techniques
Structured code techniques should be used at all times. When passingcontrol from the main loop to a subroutine (or procedure), always passcontrol with token bytes in RAM that the subroutine can check. Once thereturn from subroutine occurs, the token bytes should be cleared orchanged to the next value. This prevents the code in these routinesfrom being called and executed accidentally by runaway code. Asimplified example is shown in Figures2 and 3, below

Figure 2.Token Passing, Main Loop

Figure 3.Token Passing, Subroutine

Filling Unused Memory
If the CPU ever runs away, a convenient way to help get it back ontrack is to fill unused memory with a single byte instruction. Useeither a SWI (software Interrupt) or a NOP (no operation) instructionfollowed by an occasional JSR Start instruction as shown in Figure 4and Figure 5, below,   respectively.

Figure 4. Memory Fill with SWI

Figure 5. Memory Fill with NOP

This technique is best implemented so that the SWI interrupt routinecan determine if (through Token passing or other means) the interruptwas called intentionally or not and take the appropriate action. Mostlinkers or programmers have options that can be used to fill unusedmemory blocks with the same data.

Unused Interrupt vectors
Define all interrupt vectors, even those that are not used. Vectorsfrom unused MCU functions should be pointed to a safe routine thatcould indicate an error condition if executed.

Hardware protection features are included in the MCU to improvesystem stability and reliability. They help gracefully recover thesystem in the event of a significant electrical disturbance that is toosevere for the hardware defense mechanisms and which may result in acode runaway condition. In general, try to use every protection featureavailable in the MCU of choice.

Before listing good design practices, it is also imperative that theMCU initialization code insures that these features are turned on. Insome MCU's there will be an enable bit in a Configuration Registerwhich controls the most of these features. Because these bits are often”write once” bits, it is necessary for the software to write thesecontrol bits even if the default states are not changed. This willprevent a run away code situation from unintentionally turning theprotection mechanism features off.

COP (Computer Operating Properly)or Watchdog
The following is a list of good practices that should be followed toinsure that the COP hardware will recover a runaway code situationquickly and reliably. The scope and spirit of these guidelines is tominimize the likelihood that a random set of conditions could servicethe COP. They include:

1) Use the shortest COPtimeout period possible to insure that a runaway condition will notlast very long. The nature of the application will dictate the actualCOP timeout period chosen.

2) Avoid placing COPrefreshes in interrupt routines. Interrupts can be serviced even if theCPU is stuck in an unknown loop within the main program.

3) Ideally use one COPrefresh operation within the main loop.

4) If main loop period isgreater than COP timeout – refreshes should be placed at equalintervals of 80% of COP timeout period.

5) If main loop period ismuch less than COP timeout – introduce a s/w count which will onlyrefresh at approx. 80% of COP timeout period.

6) Any loops servicing theCOP should timeout within a finite amount of time. The time will dependon how long the system can tolerate the CPU executing code incorrectly.

7) No loop servicing theCOP should have a jump instruction from the bottom of the loop to thetop of the loop unless it is based on multiple conditions, not just asingle CPU instruction.

8) The decision to serviceor not service the COP should be based on multiple conditions, not justone. For example, do not base the decision on a single CPU conditioncode bit or on a single status register bit.

9) Memory should be examinedto insure that it does not contain a string of bytes that, if executed,would feed the COP unintentionally. For example, a data table inside anHC08 MCU device may reside somewhere in memory that contains thefollowing string of data imbedded within it:

.$C7, $FF, $FF, ..

If the CPU gets lost and tries to execute an instruction from thelocation with the $C7 in it,it would perform a write to location $FFFF which would feed the COP in an HC08 MCU.

10) Servicing the COP shouldbe based on a set of conditions extremely unlikely to randomly occur.Do not make decisions to service the COP based on a single bit or bytein RAM or a single status register bit. Check the system state forintegrity before servicing the COP.

Illegal Instruction & IllegalAddress Resets
Use these features if they exist on the chosen MCU. It's potentially away of quickly recovering the system during a runaway code condition.Along with these interrupt or reset events, many MCU's will have aReset Status Register and an Interrupt Status Register that may behelpful in determining the source of the Reset or Interrupt so that thesoftware can take the appropriate action.

An Illegal Address Reset is most effective on MCU's with smalleramounts of memory as the likelihood of the runaway code landing in anunimplemented section of the memory map increases.

Low Voltage Detect (LVD) Circuits
If the chosen MCU contains an integrated power supply monitoringcircuit which will reset the MCU in the event of sudden power loss,then this feature can be used to protect the MCU from going into a coderunaway condition. However, keep in mind that the LVD circuit may notdetect a very fast loss and recovery of the supply voltage since theresponse time of these circuits are often intentionally slow.

One last tip:
If possible, architect software so that MCU resets can be tolerated.Systems with good feedback and system integrity checking are moredeterministic and can recover from a spurious system Reset much easier.

To read Part1, go to : Definingthe Problem
To read Part 2, go to: HardwareTechniques – The basic circuit building blocks
To read Part 3, go to : System power and  signal entryconsiderations

To read Part 4, go to PCBPower Supply and Floor Plan Opportunities

RossCarlton has specialized in all aspects of electromagnetic compatibility(EMC) since his graduation from Texas A&M University with aBachelor of Science in Electrical Engineering in 1985. He has been withFreescaleSemiconductor for the last eight years where he has led the EMCdesign, test and support of Freescale's 8, 16, and 32-bitmicrocontroller products. In addition, Ross represents the U.S. as aTechnical Expert to IEC Subcommittee 47A on integratedcircuits where he is the project leader for IEC 61967-2, IEC 61967-3and IEC 62132-2. He iscurrently involved in developing transient immunity test methodologiesfor standardization.

The author would like to thank GregRacino and JohnSuchyta, 8-Bit Applications Engineer at FreescaleSemiconductor  for their inputs and guidance. Their contributions were critical toensuring consistent and correct guidance.

References:
1. Ross Carlton, Greg Racino,John Suchyta, Improving the Transient Immunity Performance ofMicrocontroller-based applications. FreescaleApplication Note (AN) 2764).

2. IEC 61000-4-2, Electromagnetic compatibility (EMC) – Part4-2: Testing and measurement techniques – Electrostatic dischargeimmunity test, InternationalElectrotechnical Commission, 2001. 

3. IEC 61000-4-4, Electromagnetic Compatibility (EMC) – Part4-4: Testing and measurement techniques – Electrical fasttransient/burst immunity test, International ElectrotechnicalCommission, 2001.

4. Ronald B. Standler, Protection of Electronic Circuits fromOvervoltages, John Wiley & Sons, 1989, pp. 265-283.

5. Ken Kundert, “Power Supply Noise Reduction”, TheDesigner's Guide , 2004.

6. Larry D. Smith, “Decoupling Capacitor Calculations for CMOSCircuits”, ElectricalPerformance of Electrical Packages Conference, Monterey CA,November 1994, Pages 101-105.

7. Ronald B. Standler, Protection of Electronic Circuits fromOvervoltages, John Wiley & Sons, 1989.

8. Clayton Paul, Introductionto Electromagnetic Compatibility, Wiley & Sons, 1992.

9. Bernard Keiser, Principles of Electromagnetic Compatibility,Artech House, 1987.

10. T.C. Lun, “Designing for Board LevelElectromagnetic Compatibility”, MotorolaApplication Note (AN) 2321

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.