Software techniques for comprehensive EMC testing of embedded systems - Embedded.com

Software techniques for comprehensive EMC testing of embedded systems

Almost every product designed is required to pass a suite of testsknown as Electromagnetic Compatibility (EMC)tests. The suite of EMC tests usually includes some form of each of thefollowing: electro-static discharge (ESD),radiated emissions, conducted emissions, radiated susceptibility, andconducted susceptibility.

There are many versions of each of these types of tests. The versionthat applies to your product depends on the products application andthe agency approvals (UL, CSA, EC, etc.) that are required. A commonmisconception is that EMC issues are the domain of hardware engineers,when in reality an embedded software engineer can and must be involved!

This article will cover many of the basics for how a softwareengineer can provide help with product EMC performance and testing. Butit is certainly by no means an exhaustive list. Many times a little bitof thinking outside of the box can provide the solution to a pesky EMCproblem. So don't be afraid to try things!

Typically, there are at least four ways the embedded softwareengineer can aid in the process (order of value added):

* improve performance against susceptibility tests,
* provide debug assistance for susceptibility tests,
* provide automated execution of device functionality, and
* minimize emissions.

Improving Performance againstSusceptibility to EMI
First, let's address how an Embedded software engineer would beinvolved in dealing with the device's susceptibility to ElectromagneticInterference (EMI). The hardware engineer's role is to prevent as muchEMI as possible from ever reaching the product's sensitive electronics.

However, sometimes it's impractical for this to be accomplished wellenough to pass all tests. Using software techniques to resolve issueswith susceptibility to EMI can be quite beneficial because software canalmost always remain fluid much later in a product development cyclethan hardware can. When a troublesome susceptibility issue isdiscovered, it quite often would cause significant schedule pain toperform another round of hardware changes.

On the other hand, software is much more likely to be able to absorbthe time required to implement solutions. The Embedded softwareengineer provides the second line of defense. His role is to minimizethe disruption to the product caused by EMI that has gotten past theHardware Engineer's first line of defenses.

Sometimes a passive filtering solution to an EMI susceptibilityissue is technically not a challenge to implement, but might beproblematic when a small board size is desired. Adding capacitors andferrite beads can consume valuable board real estate! If a solution canbe implemented in software, there is no negative impact on the boardsize, making it the preferred solution.

Mechanisms are put in place that increases how well the system cantolerate the residual EMI. The residual EMI can adversely affect thesystem in many ways. It could corrupt:

* microcontroller registers,
* memories,
* communications channels,
* digital I/O, and/or
* analog I/O.

Corruption of microcontroller registers can manifest itself in manydifferent ways. If the program counter is corrupted, the program canexecute code out of sequence, or execute from an unprogrammed locationin the program memory.

Using watchdog timers
Probably the most time-tested method of verifying flow control of aprogram is through the use of a watchdog timer. A watchdog is afree-running timer that when expired causes the reset of themicrocontroller and/or system. The expiration of the timer is avoidedby resetting the timers count value through some simple maintenancemethod.

Usually this maintenance is as simple as providing a pulse on aninput to an external watchdog chip or providing a specific writesequence to an on-chip watchdog register. The maintenance of thewatchdog is added at key points in the program.

If the execution of the program goes haywire, the watchdog will notbe properly maintained and the system will be reset. Watchdogmaintenance must be properly designed however. If the maintenance ofthe watchdog is placed in a timer ISR, then the maintenance may occurproperly even though the foreground execution is lost.

A more acceptable design technique would be to set the output highin the foreground routine, and low in a timer interrupt service routine(ISR). In this case both elements are required for proper maintenanceof the watchdog. The benefit is that some noise event might disruptexecution of the device, but instead of ending up in an unknown stateor possibly even trapped in an endless loop, the device resets andresumes functioning.

Another method of program flow verification is to use sequencechecks interspersed throughout the program. A sequence variable is usedto verify the program sequence. At select points, this variable iscompared against an expected value. If the value is correct, it isincremented (or somehow set to its next expected value). If the valueis not correct, the controller can go to an error state, or invoke asoftware reset of the controller.

It should be noted that this method adds overhead to the program,and is not easy to maintain, especially when changes to the programflow are required. This method is usually only advisable insafety-critical systems where execution out of sequence can havehazardous effects.

Software filtering
Input errors can be controlled by software filtering (debouncing) of inputs. There aremany tried and true debouncing methods. The key is to choose one thatbest balances the required responsiveness to the input versus the needto tolerate glitches on the input.

For instance, if the system requirements state that you mustrecognize a button press within 50ms, a good filtering method might beto sample the input in a 10ms ISR and require the input to be seen atthe opposite state for at least 40ms (4 consecutive samples) for thedebounced state to logically change.

Also, some DSPs, such as TI's TMS320F28XX family, allow you toconfigure filtering of inputs. The GPIO can have a sampling rate and aqualification count configured. The two together form the samplingwindow.

For the input qualifier to detect a change in the input, the levelof the signal must be stable for the duration of the sampling windowwidth or longer. This is filtering that consumes no instruction cyclesafter it has been configured, so be sure to use it when you can!

Software can also be used to digitally filter an analog inputsignal. This is most efficiently accomplished on a digital signalprocessor, but it can also be implemented on a standard microcontrollertoo. A FIR (finite impulse response) or IIR (infinite impulse response)filter can be implemented to clean up an input signal that is dirtiedby EMI.

Digital filters can provide remarkable performance compared to theiranalog counterparts. For implementation on a standard microcontroller,there are many 'C' source implementations of these filters freelyavailable on the internet. For implementation on a DSP, almost all DSPvendors provide optimized assembly implementations of these filtersthat make use of the hardware features of the DSP.

Figure1. ECG with 60Hz Power Line Interference

For example, consider the case of a medical device that monitors ECGthrough an A/D converter to detect the R-Wave in a QRS complex. The QRScomplex energy is typically in the band from 10Hz to 40Hz. An FIR bandpass filter in this band can be extremely effective ateliminating 50/60Hz power line noise as well as other higher frequencynoise sources.

The attenuation below 10Hz is more to eliminateuninteresting ECG components, but also eliminates baseline wander andlow frequency noise sources. Figure 1above shows a representative ECG signal with power lineinterference. Figure 2 below shows the same signal after being filtered by a 100 tap FIR filter witha 0 – 40Hz pass band.

If a transient EMI event corrupts the value of an output port, it ispossible that the effect may be minimized by a periodic refresh of theoutput port values. Of course, in many cases having the outputcorrupted for any amount of time is unacceptable, but in the caseswhere it is manageable (an LED output for example), this could be anacceptable solution.

Figure2. ECG through 100 Tap FIR Filter

This solution can also be applied to configuration registers ofon-chip peripherals. A periodic refresh of the configuration of thoseperipherals might make the temporary corruption of one of thoseregisters tolerable. However, sometimes writing to a peripheralconfiguration register can cause a reset or other disruption in theoperation of the peripheral, so use care when employing this method.

In many cases, the designer has a choice between using alevel-triggered or an edge-triggered interrupt. If it is feasible, thelevel-triggered interrupt should be chosen. Whereas an edge-triggeredinterrupt typically does not impose any minimum requirement on theinterrupt event pulse width, a level-triggered interrupt will.

Typically, an interrupt controller samples the interrupt inputs atsome defined frequency (once per instruction cycle, for instance). Alevel-triggered interrupt will require the event to be present twosamples in a row before an interrupt is generated. The minimum width isthe sample period. An edge triggered interrupt will look for twoconsecutive samples that indicate the intended transition.

Since the transition can occur almost immediately before the nextsample, there really is no minimum width imposed on the interruptevent. The net effect is that noise is more likely to trigger an edgetriggered interrupt than a level triggered interrupt. Microcontrollerssample interrupts using different methods. Understanding how yourmicrocontroller samples interrupts is the key to determining which typeof interrupt is less sensitive to noise than the other.

EMI in the serial channels
Another area that typically gets affected by EMI is serialcommunications channels. Even if a noise tolerant physical layer suchas RS485 or LVDS (Low-voltage differential signaling) is used in acommunications link, data can be corrupted by noise.

Software can detect these errors and provide reasonable response.Simple errors contained in a single byte may be detected through aframing or parity error. Typically a UART provides this built-indetection.

If such an error is detected, the receiving device should require apacket retransmission. Depending on the protocol, this may beaccomplished by not acknowledging the packet, or sending a specialerror acknowledgement back. A protocol can be designed in which dataincludes error correcting codes (ECC).This approach provides detection and correction of a limited number ofbit errors.

The disadvantage is the overhead of the additional error correctingbits, and the inability to flawlessly deal with multiple bit errors. Amore robust (and highly recommended) method of detecting errors in acommunications packet transmission is to include a Cyclic Redundancy Check (CRC) aspart of the packet.

A two byte CRC provides 100% coverage of bit failures occurringwithin the same byte and 99.998% coverage of all other bit failures.The CRC can be used to detect errors, but does not provide any meansfor error correction.

A mismatch of the CRC to the value calculated based on the receiveddata should result in the receiving device again requiring aretransmission of the packet (Figure3, below ) . As long as the EMI corrupts only a smallpercentage of the packets, and the system was designed with sufficientbandwidth to begin with, the overhead of the retransmissions for failedpackets will most likely not lead to unacceptable communicationsthroughput.

Figure3. Communications with packet CRCs

A similar method is to include a checksum of the packet as part ofthe packet transmission. The checksum is easier (i.e. faster) tocompute, but provides significantly less coverage of bit failures inthe packet transmission.

For example, toggling a bit in one position in one byte, andtoggling the same bit position of the opposite value in another bytewould lead to the same checksum even though 2 bytes have beencorrupted. There is much information available on these and many othercommunications error detection and/or correction schemes. It is highlyadvisable to implement the one that best matches up to the requirementsof your device, and the environment that your device will be used in.

Volatile memory corruption
EMI can cause volatile data memory to become corrupted. These errorsare difficult to detect, but a few methods can be employed in somecases. When only a specific range is valid for a data element, then aplausibility check of the data should occur before it is used.

Along these same lines, when a switch statement is used on avariable, a 'default' case should always be included. This provides aminimal amount of error detection, but more importantly, it preventsthe program from executing code based on a data value that was notaccounted for.

If the data in question changes and is accessed infrequently, thenthe data can be verified through the use of a CRC or checksum of theblock of data. When using a checksum, a new checksum value can begenerated more quickly if the old data value is subtracted out first,and then the new data value is added in.

These methods require the overhead of additional time. They shouldonly be used where appropriate. If 3 copies of the data are stored,then a vote can be taken to choose the value to use. This allows theprogram to recover very gracefully. If one of the 3 copies of the datais corrupted, the corrupted value can be restored.

A simple macro can be written to handle the retrieving andverification of this data. If only 2 copies are stored, then the 2copies must match for the data to be considered valid. If not, then anerror handling routine must be called. These methods require theoverhead of additional time and additional RAM. They should only beused where appropriate.

When the program does not fill the entire program memory, it isadvisable to fill the remaining program memory with:

* a software interrupt instruction, if the microcontroller has suchan instruction.
* an illegal instruction, if the microcontroller can trap illegalinstructions.
* NOPs, or some other instruction which has no cumulative net effect.

At the end of this block should be a jump to an error handlingroutine. If the program execution would get lost and jump into thisblock, the NOPs (or similar) would be executed until the jump to theerror handler is reached.

The first two methods are preferable if available since thevectoring to the error handler will occur much quicker. This could aidin debugging the problem.

Providing Debug Assistance
When running formal EMC tests, the test setup must be non-intrusive. Inmost cases, it is not acceptable to connect an emulator to the unitunder test, or to connect an oscilloscope probe to the unit under test.The emulator or scope could influence the EMC test results. When theapplied EMI causes the unit to fail, it can be very difficult todetermine in what way the unit failed.

The embedded software engineer must provide as much debug assistanceas possible to reveal what the failure mechanism was. Sometimes therequired debug assistance can be quite simple. A highly effective yetsimple method is to provide some type of dynamic signal that indicatesthat the unit is “alive”.

An LED, for instance, can work well for this purpose. If thisdynamic signal can be changed (in frequency for instance) when the unithas entered an error state then the signal is even more effective.Usually more debug assistance is required though.

When running burst, surge, or ESD testing, real-time status of theunit can be monitored through a wireless communications link (IrDA,802.11, Bluetooth, etc) if one is available.

If no wireless communications link is available, or when runningradiated susceptibility testing in an RF anechoic chamber, you mighthave no choice but to debug the unit “after the fact”. Detailednon-volatile logging of pertinent events can provide valuable clues asto what happened.

If there are communications links, even on board I2C, CAN, or SPIbusses, keeping performance statistics that can be queried after thetest suite has completed can also indicate a problem area thatotherwise might not have been observable. Non-volatile event andstatistics logging is preferred since the EMI could lead to a systemreset.

If non-volatile storage isn't available, it still might be possibleto query statistics, device state, and other pertinent data from thedevice if it remains running in the failure state after the test suitecompletes. It may be possible to deduce the method of failure by seeingthe result of the failure.

It's quite typical that informal prescreening tests are run prior toexecuting the formal EMC test suite at a test house. Use thisopportunity to find the suspect areas and determine what data needs tobe accessible during or after the test. It might reveal previouslyunforeseen hot spots that require some creative means of providing theclues to the source of the problem.

Providing Automated Execution ofDevice Functionality
Quite often, the device under test requires user intervention or otherstimulus to cause it to execute all of the major blocks of itsfunctionality that might be influenced by EMI. Especially for radiatedsusceptibility testing in an RF anechoic chamber, it is not alwayspossible to provide these stimuli.

In these situations, one solution is to create a special EMC testbuild that automatically sequences the unit under test through thesemajor blocks of functionality without the normal stimuli. In thissituation especially, the ability to log or visually indicate theprogression through functional blocks is critical. If the device fails,you need clues as to what it was doing when the failure occurred.

Figure4. Spectrum of MCU clock

Minimizing Emissions
Lastly, let's address how the embedded software engineer can influenceminimizing the emissions of the product. In some of these techniques,we truly are minimizing the emissions, but in most cases we are furtherspreading the emissions over the frequency spectrum so that the averagepeak energy at any specific frequency is minimized.

Let's consider a system that has a switching power supply. Despitethe Hardware Engineers best efforts, the emissions at its switchingfrequency might exceed the standards limits.  (Figure 4, above) Many switchingpowersupply control ICs can be driven by an external clock.

You can drive the clock input of the switching power supply controlIC with a clock output of the microcontroller. At the frequencies thatmost switching power supplies operate at, the emissions are measured in9kHz bands.

If you implement a frequency hopping scheme for the clock outputfrequency that spans some multiple of the emissions measurementbandwidth (for instance 3X or 27kHz), the amplitude of that very strongpeak is spread to lower amplitudes in the wider band (Figure 5, below ).

Figure5. Spectrum after clock dithering

Most switching power supply control ICs with clock inputs can easilywithstand that frequency variance. This is commonly known as spreadspectrum clocking (SSC), or clock dithering. This scheme lowers theaverage value of the peaks of the currents even though the total amountof energy in the waveforms is the same as before.

Software is required to implement this spread spectrum clockingscheme. If your microcontroller has an internal DMA feature, it ispossible that this scheme can be implemented with little or no impacton processing bandwidth.

Just configure a DMA channel to continually loop through clockfrequency register reload values stored in a constant array. Note alsothat after reset, the switching power supply control IC can run off itsown internal oscillator until the software configures the switching ofthe micro's output clock.

The spread spectrum clocking can also be extended to synchronouscommunications busses where your microcontroller is the master of thebus. Consider an SPI bus (a TI McBSP for instance) that can beconfigured to use an external signal to drive the baud clock.

A microcontroller timer output can be connected as the drive sourcefor the baud clock. This timer output can then be configured for aspread spectrum clocking scheme. As long as the frequency variance onthe baud clock stays within the frequency range specification, anddoesn't cause setup or hold time issues, data communications shouldoccur without issue, and the peak emissions will be reduced.

Some microcontrollers that provide parallel address/data busses foraccess to external components provide programmability of the edge ratesof the control signals. In these cases, using the slowest required edgerate will minimize the emissions.

When using a serial communications link to transfer data betweenICs, or especially if off board, care should be taken to minimize therate of communications packets. Limit link activity to the least thatis required. During the formal EMC emissions testing, the emissions aremonitored on an average basis. By limiting the link activity, theaverage is lowered. I've personally seen this be the difference betweenpassing and failing a test!

Another way to minimize emissions is to disable any microcontrollerinternal or external clocks that are not being used. Many timesinternal peripherals will have individual clock enables.

If the peripheral is not used, disable the clock! It will minimizeemissions and reduce power consumption too. Many microcontrollers haveexternal clocked outputs that can be disabled too. The ALE output of8051 family is a good example. Another example is a CLKOUT output ofthe system clock. If you're not using it, disable it.

Another thing to do is architect your code to utilize blocking tasks(if using an OS) or to utilize interrupt wakeups (for non-OS) to runtasks rather than just having idle loops spinning in the background.

This architecture enables you to better utilize low power processormodes during the idle time between the bursts of activity. This mayallow you to shut down a noisy high speed crystal more often, or evenallow you to reduce your clock frequency because these designs makebetter use of processing bandwidth. Usually low power and low emissionsgo hand in hand.

Joe Brotz is a Senior DesignEngineer with Plexus Technology Groupin Neenah Wisconsin. He has over 20 years of experience in embeddedsoftware design on products ranging from safety light curtains tomedical devices. Joe has a BS in Electrical and Computer Engineeringfrom Marquette University. He can be reached at joe.brotz@plexus.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.