Using IEEE-1588 transparent clocks to improve system time synchronization accuracy - Embedded.com

Using IEEE-1588 transparent clocks to improve system time synchronization accuracy

Simply put, time synchronization is setting the time on two or more clocks to be the same. Hidden in this simple sentence are obstacles involved with “setting” the time, the length of time it takes to “set” the time and the varying levels of acceptance of what the “same” time actually is. Just the notion of “same time” conjures up accuracy to the second, millisecond, microsecond, nanosecond, or better.

Key to understanding synchronization is that clocks drift and need to be corrected periodically. This begs the questions, “How long are they 'the same' before they are not 'the same' as they drift apart?”

It takes time to go through the process of correcting the time, and during this process how accurate can we set the time relative to another clock in the first place? This process of correcting the time is challenging and is a limiting factor in how accurately two clocks can be synchronized.

IEEE-1588 defines a process of transferring time. However, before jumping in and demonstrating that Transparent Clocks (aka IEEE-1588 enabled switches) work great to improve IEEE-1588 time transfer accuracy, there are a few fundamentals we need to cover along the lines of offsets and delays and how switches and routers contribute to both.

Offset & Path Delay
The difference between the time on two clocks is known as an offset. In timekeeping we strive to keep the offset below a particular value so that we can assign an accuracy value to the clock.

The process of setting one clock to another is a matter of computing the offset, say between the slave clock and the master clock. Low accuracy applications simply broadcast out the time from the master and the slaves set their clocks when they receive it.

This is analogous to days past when the town bell rang at high noon (the on-time “event”) and we were not concerned with how long it took the sound waves to reach our ears or how long it took as we set our clocks. That time it takes for the time event to travel from the master clock to the slave is called delay, or path delay (Figure 1 below .

Figure 1. Timing messages sent from the master to the slave (and vice versa) experience variable delays caused by the switches in the network leading to timing errors at the slave clock.

Symmetric & Asymmetric Delay
Time transfer delay and errors associated with eliminating delay are the main source of error related to accurately transferring time from one clock to another. In packet based networks such as Ethernet, timing packets are exchanged between the master and the slave for the purpose of computing the time offset.

If the packet exchange delay on the master-to-slave path and slave-to-master path were identical, the offset could be computed perfectly since the delays would cancel out in the math. The notion of path delay both ways between the master and the slave being the same is called symmetric delay and time transfer over packet networks assumes symmetric path delay.

Unfortunately, path delay does vary between the master-to-slave and slave-to-master and this is called asymmetric delay. To make it worse, these asymmetric delays introduced by the network cannot be easily characterized so the whole problem is dubbed nondeterministic. IEEE-1588 timing packets transit a LAN containing switches (or worse routers) that add asymmetric delay in the 10's to 100's of microseconds, if not more.

The Problem with Queues in Switches and Routers
Switches and routers are store-and-forward devices. Packets received on one port are stored temporarily while the device figures out which forwarding port(s) to send them to. We are now at the heart of the problem. Asymmetric delay introduced by switches and routers in the path between the master and the slave is caused by packet queuing.

Received packets are placed in a queue (i.e. buffer) for that forwarding port until they can be sent out (Figure 2 below ). This all happens extremely fast unless the forwarding port is busy receiving traffic from multiple receive ports, then there will be a delay in forwarding the packets. The time the packet is delayed is called the residence time.

Figure 2. Packet queuing inside the switch where data and timing packets must be forwarded to the same port introduces Packet Delay Variation (PDV) leading to timing errors.

Due to the variable queuing delays inside the switch, the timing packets exchanged between the master and slave are delayed and the length of the delay is variable. This variable delay is called Packet Delay Variation (PDV).

PDV refers to the arrival time jitter for timing packets that are sent from the master at precise intervals but have variable arrival times at the slaves. In the world of sub microsecond time synchronization over packet networks, PDV makes it very difficult to accurately compute precise offsets and adjust the slave clock.

Overcoming Switch Queuing Delay
Since clocks are separated by distance on a network, however large or small, the laws of physics require a minimum delay to transfer a timing packet. In a perfect world, any delay would be bidirectionally constant between master and slave clocks, and computing wonderfully accurate time offsets would be possible.

Alas, PDV is an ever present spoiler of precise time transfer over packet networks and a great deal of mathematics and technology must be applied to minimize its effects.

PDV is a one sided metric. Packets arrive in the minimum time or are delayed. (Never do packets arrive ahead of time). An obvious solution to the PDV problem is to exchange more timing packets and filter out and use the “lucky packets” that arrive with the minimum amount of delay.

For microsecond and sub-microsecond time-of-day synchronization, this approach requires substantially increased packet exchange rates, precise hardware-based time stamping and very intelligent algorithms to filter the packets, compute the offsets and adjust the slave clock.

Not all of these techniques are widely available yet in commercial off the shelf products and there can be diminishing returns where more is not always better. When using IEEE-1588 there are two alternatives to the brute force method of sending more timing packets and developing smarter algorithms to sort it all out.

One is using intermediate switches that can also be a 1588 clock that is used to transfer time (aka Boundary Clock). The second is measuring individual timing packet queuing delays inside the switch so they can be eliminated from the clock offset calculations and permit more accurate time synchronization between the master clock and the slave clock.

This ability to measure timing packet delays inside the switch and account for them makes the switch induced delay transparent to the slave timing offset calculations. Switches with this timing packet delay measurement capability are called Transparent Clocks.

Time Stamping Packets
In the most basic sense, the master transfers time by sending a time stamped packet to a slave. This works pretty well if all that is required is, say 1-second accuracy, because packets can usually transit from master to slave in less than a second. The next level of synchronization occurs when the master and slave exchange time stamps, and do so frequently.

The IEEE-1588 2008 standard can accommodate very fast message exchange rates. While this is not the singularly most essential aspect to better synchronization, it is part of the overall solution. Another part is time stamping packets in field programmable gate array hardware FPGA (a common hardware time stamping technique) that eliminates operating system stack delays and significantly reduces overall PDV.

The standard defines two techniques for time stamping what are known as the Sync and Delay_Request packets (called event messages). The Sync and Delay_Request packets are timing packets used to make the measurements between the master and slave clocks. Sync messages travel from master-to-slave, and Delay_Request messages travel from slave-to-master.

One-Step and Two-Step Clocks
There is plenty of literature (and the IEEE-1588 standard itself) available to explain how timing messages are exchanged. The basic principle is that the slave can compute its offset from the master using four (4) precise time stamps.

These time stamps are related to the exchange of the Sync and Delay_Request event messages. Since the offset calculations are performed at the slave, the master sends additional packets to the slave that contain the measurement values made at the master.

There are two time stamping message exchange techniques defined in the IEEE-1588 2008 standard. These are called one-step clocks and two-step clocks and are generally implemented in FPGA hardware. The goal is to measure/know exactly when the timing packet left the clock on to the network. This eliminates all operating system stack delays of the clock itself.

An analogy of a one-step clock would be if a baseball pitcher (master clock) wrote the exact time the ball will leave his hand on the ball itself and the catcher's glove (slave clock) measured exactly when the ball arrived. Through this type of exchange between the pitcher and the catcher, the catcher could then compute the time offset directly.

An analogy of a two-step clock would be if the baseball pitcher had a sensor in his hand that measured the exact time the ball left his hand and the catcher's glove measured exactly when it arrived.

Unlike the one-step clock where the exact time was written on the ball, in the two-step process the pitcher measured when the ball left. Since the ball is now gone the pitcher needs to communicate to the catcher the time the ball left his hand before the catcher can compute the time offset.

Two-step clocks operate more like Sync and Delay_Request event message measurement devices rather than direct time stamping devices (Figure 3 below ). When an event message is sent from a CPU to the network, the FPGA measures precisely when it left the clock on to the network. Therein lies why there are “two steps.” The first step measures when the event packet left and second step sends the actual measurement (aka time stamp) on the heels of the first packet.

The time stamps of event messages at the master are forwarded to the slave in Follow_up or Delay_Response messages. In the case of the Follow_Up message, when the Sync packet is sent (master to slave) the master clock measures its exact departure time. This departure time is placed in the Follow_Up packet and sent to the slave. Slave clocks measure the arrival time of sync packets.

Figure 3. Transparent Clocks (aka IEEE-1588 enabled switches) measure Sync& Delay_Request event message delays and place the measured delays in intercepted Follow_Up and Delay_Response messages.

The event messages sent from the slave to the master are called Delay_Request packets (analogous to the Sync packet). The arrival time of the Delay_Request event message at the master is returned to the slave in the Delay _Response message (analogous to the Follow-Up packet).

Switch Transparency
Packet queues delay event messages (Sync and Delay_Request) in both regular switches and transparent clocks (aka switches that support IEEE-1588 transparency).

The difference is that transparent clocks measure the time the event packet is delayed in the switch from the time it entered to when it left. The transparent clock does not need to know the exact time, only measure the duration the event message was delayed by the switch.

By using FPGA hardware, transparent clocks measure when event messages arrive on a port and when they leave any other ports.

The Sync and Delay_Request event residence time is stored by the transparent clock. When the subsequent and related Follow_Up and/or Delay_Response message arrives for switching back to the slave, the transparent clock modifies the information in the packet to account for the delay of the related original Sync or Delay_Request message. It is really quite a clever way to provide the slave with queuing delay information.

The idea of “transparency” comes in once the slave computes the time offset from the master. The time synchronization accuracy is so good it is as though the switch delays were transparent and had little to no effect.

Queuing Delay Reduces Time Synchronization Accuracy
Time accuracy errors introduced by switch queuing delays are relatively easy to demonstrate. First you sync the slave directly to the master with no intermediate network other than a crossover cable to get a best case baseline measurement.

The most convenient way to compare the time on the slave to the time on the master is by way of using the 1 pulse per second (1PPS) outputs of each clock. A counter, oscilloscope or in this case a time interval measurement built into the master can be used to measure the difference between the on-time master 1PPS and the slave 1PPS.

In the world of time keeping, many measurements are made over long periods of time to determine clock accuracy relative to a more accurate clock. The results are typically presented using the mean clock error relative to the master as well as the dispersion, RMS, standard deviation, peak-to-peak, etc.

Creating Queues Using Two Switches
Since queues are formed when at least two paths are converged to one, the introduction of two switches connected by a single cable between them will create the queues and subsequent delays. In this example, two identical, standard enterprise class switches are used so that a queue will be formed in each direction of network traffic.

Initially, the only traffic is timing packets between the master and slave. This forms the baseline for timing through the switches.

Next, real world queuing delays are created when the traffic generators are activated and the slave accuracy is measured again (Figure 4 below ). At this point, there are a number of variables that can ultimately contribute to slave synchronization accuracy to the master. They include, but are not limited to, queuing behavior in the switch, the nature of the traffic flow from the generators and slave sensitivity to PDV.

Figure 4. Test setup to measure master to slave synchronization accuracy through switches with queuing delays created each direction by using two switches. Measurements made with traffic generators on and off.

The standard enterprise class switches were then replaced with transparent clocks (aka IEEE-1588 enabled switches) and the tests repeated with and without data traffic.The slave accuracy test results from the five test setups are presented in Table 1, below .

Table 1. IEEE-1588 slave accuracy using transparent clocks is virtually unaffected by any queuing delays.

In general, with no data traffic interfering with the timing traffic, there was very little degradation in slave accuracy using standard switches, transparent clocks, or a crossover cable. However, once the queues were created by turning on the data traffic, everything changed.

With the transparent clocks and nearly 100% network utilization (and no doubt lots of queuing) the slave was able to synchronize with the same accuracy as if there were no traffic at all! The slave accuracy measurements were nicely grouped below the 100 nanosecond level and exhibited only a 15 nanosecond mean offset from the baseline crossover cable measurements.

In contrast, using the standard switches, the very poor slave accuracy caused by the queuing delay and subsequent PDV was so bad that the test was terminated at 4% network utilization. The point was made very clear that queuing degraded slave sync accuracy and that using the transparent clocks the queuing delays were very accurately accounted for.

Demonstration Summary
Real world timing traffic switched with data traffic on the network usually results in time synchronization degradation. The problem of course is the residence time delay resulting from packet queues inside switches.

IEEE-1588 transparent clocks work very well to reduce or eliminate those time synchronization errors due to the packet queuing inside switches. In this demonstration, the statistical performance of slave synchronization accuracy in the network with transparent clocks was nearly identical to a crossover cable except for a 15 nanosecond shift. Several key points to remember, include:

1) Traffic on networks cause queuing delays in switches.

2) Queuing delay causes packet delay variation (PDV) in IEEE-1588 timing packets arrival times at slaves.

3) PDV can significantly degrade slave time synchronization accuracy.

4) Transparent clocks compensate for PDV by measuring packet delay inside switches.

5) Transparent clocks can enable slave synchronization accuracy to master similar to that of a crossover cable between the master and slave.

Conclusion
When deploying IEEE-1588 as a time synchronization solution, give serious consideration to the timing accuracy you require, and how well the slave you are considering performs in the presence of other network traffic.

IEEE-1588 slave manufacturer data sheets are likely to provide a best case accuracy specification to a master via crossover cable or through a single switch with no traffic. But that does not account for how network packet queuing delays inside Ethernet switches degrade IEEE-1588 time transfer accuracy.

Only you can determine if an IEEE-1588 slave performs as you expect by testing it in your network and if you need IEEE-1588 transparent clocks to achieve the desired accuracy throughout your network.

Paul Skoog is a product marketing manager at Symmetricom, Inc. He holds a BSME degree from California Polytechnic State University and an MBA from Santa Clara University Graduate School of Business. He can be reached at pskoog@symmetricom.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.