These specs vary depending on the logic device. Some might require tens of nanoseconds of set-up and/or hold time; others need an order of magnitude less.
![]() |
| Figure 9.1: Setup and Hold Times |
If we tend to our knitting we'll respect these parameters and the flip-flop will always be totally predictable. But when things are asynchronous—say, the wrist rotates at it's own rate and the software does a read whenever it needs data—there's a chance the we'll violate set-up or hold time.
Suppose the flip-flop requires 3 nanoseconds of set-up time. Our data changes within that window, flipping state perhaps a single nanosecond before clock transitions. The device will go into a metastable state where the output gets very strange indeed.
By violating the specification the device really doesn't know if we presented a zero or a one. It's output goes, not to a logic state, but to either a half-level (in between the digital norms) or it will oscillate, toggling wildly between states. The flip-flop is metastable.
![]() |
| Figure 9.2: A Metastable State |
This craziness doesn't last long; typically after a few to 50 nanoseconds the oscillations damp out or the half-state disappears, leaving the output at a valid one or zero. But which one is it? This is a digital system, and we expect ones to be ones, and zeroes zeroes.
The output is random. Bummer, that. You cannot predict which level it will assume. That sure makes it hard to design predictable digital systems!
Hardware folks feel that the random output isn't a problem. Since the input changed at almost exactly the same time the clock strobed, either a zero or a one is reasonable. If we had clocked just a hair ahead or behind we'd have gotten a different value, anyway. Philosophically, who knows which state we measured? Is this really a big deal? Maybe not to the EEs, but this impacts our software in a big way, as we'll see shortly.
Metastability occurs only when clock and data arrive almost simultaneously; the odds increase as clock rates soar. An equally important factor is the type of logic component used: slower logic (like 74HCxx) has a much wider metastable window than faster devices (say, 74FCTxx).
Clearly at reasonable rates the odds of the two asynchronous signals arriving closely enough in time to cause a metastable situation are low, measurable, yes, important, certainly. With a 10 MHz clock and 10 KHz data rate, using typical but not terribly speedy logic, metastable errors occur about once a minute. Though infrequent, no reliable system can stand that failure rate.
The classic metastable fix uses two flip-flops connected in series. Data goes to the first; its output feeds the data input of the second. Both use the same clock input. The second flop's output will be "correct" after two clocks, since the odds of two metastable events occurring back-to-back are almost nil. With two flip-flops, at reasonable data rates errors occur millions or even billions of years apart, good enough for most systems.
However "correct" means the second stage's output will not be metastable: it's not oscillating, nor is it at an illegal voltage level. There's still an equal chance the value will be in either legal logic state.
Hardware designers smugly cure their metastability problem using the two stage flops described. Their domain is that of a single bit, whose input changed just about the same time the clock transitioned. Thinking in such narrow terms it's indeed reasonable to accept the inherent random output the flops generate.
However, we software folks are reading parallel I/O ports, each perhaps 8 bits wide. That means there are 8 flip-flops in the input capture register, all driven by the same clock pulse.
Let's look at what might happen. The encoder changes from 0xff to 0x100. This small difference might represent just a tiny change in angle. We request a read at just about the same time the data changes, our input operation strobes the capture register's clock creating a violation of set-up or hold time.
Every input bit changes, each of the flip-flops inside the register goes metastable. After a short time the oscillations die out, but now every bit in the register is random. Though the hardware folks might shrug and complain that no one knows what the right value was, since everything changed as clock arrived, in fact the data was around 0xff or 0x100. A random result of, say, 0x12 is absurd and totally unacceptable, and may lead to crazy system behavior.
The case where data goes from 0xff to 0x100 is pathological since every bit changes at once. The system faces the same peril whenever lots of bits change. 0x0f to 0x10. 0x1f to 0x20. The upper, unchanging data bits will always latch correctly, but every changing bit is at risk.
Why not use the multiple flip-flop solution? Connect two input capture registers in series, both driven by the same clock. Though this will eliminate the illegal logic states and oscillations, the second stage's output will be random as well.
One option is to ignore metastability and hope for the best. Or use very fast logic with very narrow set-up/hold time windows to reduce the odds of failure. If the code samples in the inputs infrequently it's possible to reduce metastability to one chance in millions or even billions. Building a safety critical system? Feeling lucky?
It is possible to build a synchronizer circuit that takes a request for a read from the processor, combines it with a data available bit from the I/O device, responding with a data-OK signal back to the CPU. This is nontrivial and prone to errors.
An alternative is to use a different coding scheme for the I/O device. Buy an encoder with Gray code output, for example (if you can find one). Gray code is a counting scheme where only a single bit changes between numbers, as follows:
0 000
1 001
2 011
3 010
4 110
5 111
6 101
7 100
Gray code makes sense if, and only if, your code reads the device faster than it's likely to change, and if the changes happen in a fairly predictable fashion—like counting up. Then there's no real chance of more than a single bit changing between reads, if the inputs go metastable only one bit will be wrong. The result will still be reasonable.
Another solution is to compute a parity or checksum of the input data before the capture register. Latch that, as well, into the register. Have the code compute parity and compare it to that read, if there's an error do another read.
Although I've discussed adding an input capture register, please don't think that this is the root cause of the problem. Without that register—if you just feed the asynchronous inputs directly into the CPU - it's quite possible to violate the processor's innate set-up/hold times.
There's no free lunch, all logic has physical constraints we must honor. Some designs will never have a metastability problem. It always stems from violating set-up or hold times, which in turn comes from either poor design or asynchronous inputs.
If caused by, say, someone pressing a button, be sure that the interrupt itself, and the vector-generating logic, don't violate the processor's set-up and hold times.
However, in computer systems most things do happen synchronously. If you're reading a timer that operates from the CPU's clock, it is inherently synchronous to the code. From a metastability standpoint it's totally safe.
Bad design, though, can plague any electronic system. Every logic component takes time to propagate data; when a signal traverses many devices the delays can add up significantly. If the data then goes to a latch it's quite possible that the delays may cause the input to transition at the same time as the clock. Instant metastability.
Designers are pretty careful to avoid these situations, though. Do be wary of FPGAs and other components where the delays vary depending on how the software routes the device. In addition, when latching data or clocking a counter it's not hard to create a metastability problem by using the wrong clock edge. Pick the edge that gives the device time to settle before it's read.
What about analog inputs? Connect a 12 bit A/D converter to two 8 bit ports and we'd seem to have a similar problem: the analog data can wiggle all over, changing during the time we read the two ports.
However, there's no need for an input capture register because the converter itself generally includes a "sample and hold" block, which stores the analog signal while the A/D digitizes. Most A/Ds then store the digital value till we start the next conversion.
Other sorts of inputs we use all share this problem. Suppose a robot uses a 10 bit encoder to monitor the angular location of a wrist joint. As the wrist rotates the encoder sends back a binary code, 10 bits wide, representing the joint's current position. An 8 bit processor requires two distinct I/O instructions—two byte-wide reads—to get the data. No matter how fast the computer might be there's a finite time between the reads during which the encoder data may change.
The wrist is rotating. A "get_position" routine reads 0xff from the low part of the position data. Then, before the next instruction, the encoder rolls over to 0x100. "get_position" reads the high part of the data—now 0x1—and returns a position of 0x1ff, clearly in error and perhaps even impossible.
This is a common problem, handling input from a two-axis controller. If the hardware continues to move during our reads, then the X and Y data will be slightly uncorrelated, perhaps yielding impossible results.
One friend tracked a rare autopilot failure to the way the code read a flux-gate compass, whose output is a pair of related quadrature signals. Reading them at disparate times, while the vessel continued to move, yielded impossible heading data.
Next in Part 4: Dealing with
interrupt latency
To read Part 1 in this series, go to Reentrancy, atomic variables and recursion.
To read Part 2 in this series, go to Asynchronous Hardware/Firmware
Jakob
Engblom (jakob@virtutech.com)
is technical marketing manager at
at Virtutech.
He has a MSc in computer science and a PhD in Computer Systems from
Uppsala University, and has
worked with programming tools and simulation tools for embedded and
real-time systems since 1997.
He was a contributor of
material to "The Firmware Handbook," edited
by Jack Ganssle, upon which this series of articles was based and
printed
with permission from Newnes, a division of Elsevier.
Copyright 2008. For
other publications by Jakob Engblom, see www.engbloms.se/jakob.html.