There are many versions of each of these types of tests. The version that applies to your product depends on the products application and the agency approvals (UL, CSA, EC, etc.) that are required. A common misconception is that EMC issues are the domain of hardware engineers, when in reality an embedded software engineer can and must be involved!
This article will cover many of the basics for how a software engineer can provide help with product EMC performance and testing. But it is certainly by no means an exhaustive list. Many times a little bit of thinking outside of the box can provide the solution to a pesky EMC problem. So don't be afraid to try things!
Typically, there are at least four ways the embedded software engineer can aid in the process (order of value added):
* improve performance against susceptibility tests,
* provide debug assistance for susceptibility tests,
* provide automated execution of device functionality, and
* minimize emissions.
Improving Performance against
Susceptibility to EMI
First, let's address how an Embedded software engineer would be
involved in dealing with the device's susceptibility to Electromagnetic
Interference (EMI). The hardware engineer's role is to prevent as much
EMI as possible from ever reaching the product's sensitive electronics.
However, sometimes it's impractical for this to be accomplished well enough to pass all tests. Using software techniques to resolve issues with susceptibility to EMI can be quite beneficial because software can almost always remain fluid much later in a product development cycle than hardware can. When a troublesome susceptibility issue is discovered, it quite often would cause significant schedule pain to perform another round of hardware changes.
On the other hand, software is much more likely to be able to absorb the time required to implement solutions. The Embedded software engineer provides the second line of defense. His role is to minimize the disruption to the product caused by EMI that has gotten past the Hardware Engineer's first line of defenses.
Sometimes a passive filtering solution to an EMI susceptibility issue is technically not a challenge to implement, but might be problematic when a small board size is desired. Adding capacitors and ferrite beads can consume valuable board real estate! If a solution can be implemented in software, there is no negative impact on the board size, making it the preferred solution.
Mechanisms are put in place that increases how well the system can tolerate the residual EMI. The residual EMI can adversely affect the system in many ways. It could corrupt:
* microcontroller registers,
* memories,
* communications channels,
* digital I/O, and/or
* analog I/O.
Corruption of microcontroller registers can manifest itself in many different ways. If the program counter is corrupted, the program can execute code out of sequence, or execute from an unprogrammed location in the program memory.
Using watchdog timers
Probably the most time-tested method of verifying flow control of a
program is through the use of a watchdog timer. A watchdog is a
free-running timer that when expired causes the reset of the
microcontroller and/or system. The expiration of the timer is avoided
by resetting the timers count value through some simple maintenance
method.
Usually this maintenance is as simple as providing a pulse on an input to an external watchdog chip or providing a specific write sequence to an on-chip watchdog register. The maintenance of the watchdog is added at key points in the program.
If the execution of the program goes haywire, the watchdog will not be properly maintained and the system will be reset. Watchdog maintenance must be properly designed however. If the maintenance of the watchdog is placed in a timer ISR, then the maintenance may occur properly even though the foreground execution is lost.
A more acceptable design technique would be to set the output high in the foreground routine, and low in a timer interrupt service routine (ISR). In this case both elements are required for proper maintenance of the watchdog. The benefit is that some noise event might disrupt execution of the device, but instead of ending up in an unknown state or possibly even trapped in an endless loop, the device resets and resumes functioning.
Another method of program flow verification is to use sequence checks interspersed throughout the program. A sequence variable is used to verify the program sequence. At select points, this variable is compared against an expected value. If the value is correct, it is incremented (or somehow set to its next expected value). If the value is not correct, the controller can go to an error state, or invoke a software reset of the controller.
It should be noted that this method adds overhead to the program, and is not easy to maintain, especially when changes to the program flow are required. This method is usually only advisable in safety-critical systems where execution out of sequence can have hazardous effects.
Software filtering
Input errors can be controlled by software filtering (debouncing) of inputs. There are
many tried and true debouncing methods. The key is to choose one that
best balances the required responsiveness to the input versus the need
to tolerate glitches on the input.
For instance, if the system requirements state that you must recognize a button press within 50ms, a good filtering method might be to sample the input in a 10ms ISR and require the input to be seen at the opposite state for at least 40ms (4 consecutive samples) for the debounced state to logically change.
Also, some DSPs, such as TI's TMS320F28XX family, allow you to configure filtering of inputs. The GPIO can have a sampling rate and a qualification count configured. The two together form the sampling window.
For the input qualifier to detect a change in the input, the level of the signal must be stable for the duration of the sampling window width or longer. This is filtering that consumes no instruction cycles after it has been configured, so be sure to use it when you can!
Software can also be used to digitally filter an analog input signal. This is most efficiently accomplished on a digital signal processor, but it can also be implemented on a standard microcontroller too. A FIR (finite impulse response) or IIR (infinite impulse response) filter can be implemented to clean up an input signal that is dirtied by EMI.
Digital filters can provide remarkable performance compared to their
analog counterparts. For implementation on a standard microcontroller,
there are many 'C' source implementations of these filters freely
available on the internet. For implementation on a DSP, almost all DSP
vendors provide optimized assembly implementations of these filters
that make use of the hardware features of the DSP.
![]() |
| Figure 1. ECG with 60Hz Power Line Interference |
If a transient EMI event corrupts the value of an output port, it is
possible that the effect may be minimized by a periodic refresh of the
output port values. Of course, in many cases having the output
corrupted for any amount of time is unacceptable, but in the cases
where it is manageable (an LED output for example), this could be an
acceptable solution.
![]() |
| Figure 2. ECG through 100 Tap FIR Filter |
This solution can also be applied to configuration registers of on-chip peripherals. A periodic refresh of the configuration of those peripherals might make the temporary corruption of one of those registers tolerable. However, sometimes writing to a peripheral configuration register can cause a reset or other disruption in the operation of the peripheral, so use care when employing this method.
In many cases, the designer has a choice between using a level-triggered or an edge-triggered interrupt. If it is feasible, the level-triggered interrupt should be chosen. Whereas an edge-triggered interrupt typically does not impose any minimum requirement on the interrupt event pulse width, a level-triggered interrupt will.
Typically, an interrupt controller samples the interrupt inputs at some defined frequency (once per instruction cycle, for instance). A level-triggered interrupt will require the event to be present two samples in a row before an interrupt is generated. The minimum width is the sample period. An edge triggered interrupt will look for two consecutive samples that indicate the intended transition.
Since the transition can occur almost immediately before the next sample, there really is no minimum width imposed on the interrupt event. The net effect is that noise is more likely to trigger an edge triggered interrupt than a level triggered interrupt. Microcontrollers sample interrupts using different methods. Understanding how your microcontroller samples interrupts is the key to determining which type of interrupt is less sensitive to noise than the other.
EMI in the serial channels
Another area that typically gets affected by EMI is serial
communications channels. Even if a noise tolerant physical layer such
as RS485 or LVDS (Low-voltage differential signaling) is used in a
communications link, data can be corrupted by noise.
Software can detect these errors and provide reasonable response. Simple errors contained in a single byte may be detected through a framing or parity error. Typically a UART provides this built-in detection.
If such an error is detected, the receiving device should require a packet retransmission. Depending on the protocol, this may be accomplished by not acknowledging the packet, or sending a special error acknowledgement back. A protocol can be designed in which data includes error correcting codes (ECC). This approach provides detection and correction of a limited number of bit errors.
The disadvantage is the overhead of the additional error correcting bits, and the inability to flawlessly deal with multiple bit errors. A more robust (and highly recommended) method of detecting errors in a communications packet transmission is to include a Cyclic Redundancy Check (CRC) as part of the packet.
A two byte CRC provides 100% coverage of bit failures occurring within the same byte and 99.998% coverage of all other bit failures. The CRC can be used to detect errors, but does not provide any means for error correction.
A mismatch of the CRC to the value calculated based on the received
data should result in the receiving device again requiring a
retransmission of the packet (Figure
3, below) . As long as the EMI corrupts only a small
percentage of the packets, and the system was designed with sufficient
bandwidth to begin with, the overhead of the retransmissions for failed
packets will most likely not lead to unacceptable communications
throughput.
![]() |
| Figure 3. Communications with packet CRCs |
A similar method is to include a checksum of the packet as part of the packet transmission. The checksum is easier (i.e. faster) to compute, but provides significantly less coverage of bit failures in the packet transmission.
For example, toggling a bit in one position in one byte, and toggling the same bit position of the opposite value in another byte would lead to the same checksum even though 2 bytes have been corrupted. There is much information available on these and many other communications error detection and/or correction schemes. It is highly advisable to implement the one that best matches up to the requirements of your device, and the environment that your device will be used in.
Volatile memory corruption
EMI can cause volatile data memory to become corrupted. These errors
are difficult to detect, but a few methods can be employed in some
cases. When only a specific range is valid for a data element, then a
plausibility check of the data should occur before it is used.
Along these same lines, when a switch statement is used on a variable, a 'default' case should always be included. This provides a minimal amount of error detection, but more importantly, it prevents the program from executing code based on a data value that was not accounted for.
If the data in question changes and is accessed infrequently, then the data can be verified through the use of a CRC or checksum of the block of data. When using a checksum, a new checksum value can be generated more quickly if the old data value is subtracted out first, and then the new data value is added in.
These methods require the overhead of additional time. They should only be used where appropriate. If 3 copies of the data are stored, then a vote can be taken to choose the value to use. This allows the program to recover very gracefully. If one of the 3 copies of the data is corrupted, the corrupted value can be restored.
A simple macro can be written to handle the retrieving and verification of this data. If only 2 copies are stored, then the 2 copies must match for the data to be considered valid. If not, then an error handling routine must be called. These methods require the overhead of additional time and additional RAM. They should only be used where appropriate.
When the program does not fill the entire program memory, it is advisable to fill the remaining program memory with:
* a software interrupt instruction, if the microcontroller has such
an instruction.
* an illegal instruction, if the microcontroller can trap illegal
instructions.
* NOPs, or some other instruction which has no cumulative net effect.
At the end of this block should be a jump to an error handling routine. If the program execution would get lost and jump into this block, the NOPs (or similar) would be executed until the jump to the error handler is reached.
The first two methods are preferable if available since the vectoring to the error handler will occur much quicker. This could aid in debugging the problem.
Providing Debug Assistance
When running formal EMC tests, the test setup must be non-intrusive. In
most cases, it is not acceptable to connect an emulator to the unit
under test, or to connect an oscilloscope probe to the unit under test.
The emulator or scope could influence the EMC test results. When the
applied EMI causes the unit to fail, it can be very difficult to
determine in what way the unit failed.
The embedded software engineer must provide as much debug assistance as possible to reveal what the failure mechanism was. Sometimes the required debug assistance can be quite simple. A highly effective yet simple method is to provide some type of dynamic signal that indicates that the unit is "alive".
An LED, for instance, can work well for this purpose. If this dynamic signal can be changed (in frequency for instance) when the unit has entered an error state then the signal is even more effective. Usually more debug assistance is required though.
When running burst, surge, or ESD testing, real-time status of the unit can be monitored through a wireless communications link (IrDA, 802.11, Bluetooth, etc) if one is available.
If no wireless communications link is available, or when running radiated susceptibility testing in an RF anechoic chamber, you might have no choice but to debug the unit "after the fact". Detailed non-volatile logging of pertinent events can provide valuable clues as to what happened.
If there are communications links, even on board I2C, CAN, or SPI busses, keeping performance statistics that can be queried after the test suite has completed can also indicate a problem area that otherwise might not have been observable. Non-volatile event and statistics logging is preferred since the EMI could lead to a system reset.
If non-volatile storage isn't available, it still might be possible to query statistics, device state, and other pertinent data from the device if it remains running in the failure state after the test suite completes. It may be possible to deduce the method of failure by seeing the result of the failure.
It's quite typical that informal prescreening tests are run prior to executing the formal EMC test suite at a test house. Use this opportunity to find the suspect areas and determine what data needs to be accessible during or after the test. It might reveal previously unforeseen hot spots that require some creative means of providing the clues to the source of the problem.
Providing Automated Execution of
Device Functionality
Quite often, the device under test requires user intervention or other
stimulus to cause it to execute all of the major blocks of its
functionality that might be influenced by EMI. Especially for radiated
susceptibility testing in an RF anechoic chamber, it is not always
possible to provide these stimuli.
In these situations, one solution is to create a special EMC test
build that automatically sequences the unit under test through these
major blocks of functionality without the normal stimuli. In this
situation especially, the ability to log or visually indicate the
progression through functional blocks is critical. If the device fails,
you need clues as to what it was doing when the failure occurred.
![]() |
| Figure 4. Spectrum of MCU clock |
Minimizing Emissions
Lastly, let's address how the embedded software engineer can influence
minimizing the emissions of the product. In some of these techniques,
we truly are minimizing the emissions, but in most cases we are further
spreading the emissions over the frequency spectrum so that the average
peak energy at any specific frequency is minimized.
Let's consider a system that has a switching power supply. Despite the Hardware Engineers best efforts, the emissions at its switching frequency might exceed the standards limits. (Figure 4, above) Many switching power supply control ICs can be driven by an external clock.
You can drive the clock input of the switching power supply control IC with a clock output of the microcontroller. At the frequencies that most switching power supplies operate at, the emissions are measured in 9kHz bands.
If you implement a frequency hopping scheme for the clock output
frequency that spans some multiple of the emissions measurement
bandwidth (for instance 3X or 27kHz), the amplitude of that very strong
peak is spread to lower amplitudes in the wider band (Figure 5, below).
![]() |
| Figure 5. Spectrum after clock dithering |
Most switching power supply control ICs with clock inputs can easily withstand that frequency variance. This is commonly known as spread spectrum clocking (SSC), or clock dithering. This scheme lowers the average value of the peaks of the currents even though the total amount of energy in the waveforms is the same as before.
Software is required to implement this spread spectrum clocking scheme. If your microcontroller has an internal DMA feature, it is possible that this scheme can be implemented with little or no impact on processing bandwidth.
Just configure a DMA channel to continually loop through clock frequency register reload values stored in a constant array. Note also that after reset, the switching power supply control IC can run off its own internal oscillator until the software configures the switching of the micro's output clock.
The spread spectrum clocking can also be extended to synchronous communications busses where your microcontroller is the master of the bus. Consider an SPI bus (a TI McBSP for instance) that can be configured to use an external signal to drive the baud clock.
A microcontroller timer output can be connected as the drive source for the baud clock. This timer output can then be configured for a spread spectrum clocking scheme. As long as the frequency variance on the baud clock stays within the frequency range specification, and doesn't cause setup or hold time issues, data communications should occur without issue, and the peak emissions will be reduced.
Some microcontrollers that provide parallel address/data busses for access to external components provide programmability of the edge rates of the control signals. In these cases, using the slowest required edge rate will minimize the emissions.
When using a serial communications link to transfer data between ICs, or especially if off board, care should be taken to minimize the rate of communications packets. Limit link activity to the least that is required. During the formal EMC emissions testing, the emissions are monitored on an average basis. By limiting the link activity, the average is lowered. I've personally seen this be the difference between passing and failing a test!
Another way to minimize emissions is to disable any microcontroller internal or external clocks that are not being used. Many times internal peripherals will have individual clock enables.
If the peripheral is not used, disable the clock! It will minimize emissions and reduce power consumption too. Many microcontrollers have external clocked outputs that can be disabled too. The ALE output of 8051 family is a good example. Another example is a CLKOUT output of the system clock. If you're not using it, disable it.
Another thing to do is architect your code to utilize blocking tasks (if using an OS) or to utilize interrupt wakeups (for non-OS) to run tasks rather than just having idle loops spinning in the background.
This architecture enables you to better utilize low power processor modes during the idle time between the bursts of activity. This may allow you to shut down a noisy high speed crystal more often, or even allow you to reduce your clock frequency because these designs make better use of processing bandwidth. Usually low power and low emissions go hand in hand.
Joe Brotz is a Senior Design
Engineer with Plexus Technology Group
in Neenah Wisconsin. He has over 20 years of experience in embedded
software design on products ranging from safety light curtains to
medical devices. Joe has a BS in Electrical and Computer Engineering
from Marquette University. He can be reached at joe.brotz@plexus.com.