Editor’s Note: In this Product-How-To article, IDT’s Fred Hirning describes the problems faced in dealing with clock jitter in FPGA-based high-speed communications interfaces such as SerDes and how external phase locked loops (PLLs) such as the company’s VersaClock5 and FemtoClock NG clock generator can be used to resolve them.
Over a few short years, FPGA technology has advanced significantly. These devices have become extremely complex. FPGA blocks continue to maintain the phase-locked loop (PLL) technology capable of generating clocks for applications clocking synchronous logic, memory, board peripheral, complex PLD, or a microprocessor (mP), and other uses typically demanding time domain jitter specs like cycle-to-cycle and period jitter.
However, it’s a different story with high-speed interfaces such as communications for serial-de-serializer (SerDes), Gigabit Ethernet (GbE), 10 GbE, synchronous optical network/synchronous digital hierarchy (SONET/SDH), and Fiber Channel that have tight frequency domain jitter requirements.
To run properly, these high-speed interfaces rely on the low-frequency jitter component to be within spec. Existing PLLs in even the most advanced FPGAs don’t have the quality to meet the jitter requirements for the most common transmitter SerDes eye specification.
Reasons for this shortcoming vary. Digital technology embedded in high-speed FPGAs doesn’t provide the necessary performance for building a low noise PLL.
Considering that device geometries are approaching 20 nanometers (nms) with extremely small but highly advanced transistors, a key factor is the quality of a PLL’s inductor or what’s known as the ‘Q factor’. An ideal inductor would have no resistance or energy losses. The quality factor (Q) of an inductor is a measure of its efficiency. The higher the Q factor of the inductor, the closer it approaches the behavior of an ideal lossless inductor.
From a PLL design perspective, it’s essential to achieve good phase noise (PN) to meet demanding PN requirements of the transmitter SerDes for high-speed protocols. Achieving a high Q factor in a PLL design usually means some changes in metal layers, either a thicker metal or using another type of metal, for example, copper.
It’s a different process than what most typical FPGA IP blocks need, especially in the lower geometries. Plus, it’s a more expensive process. Therefore, to design an ideal PLL, special processes are required, for example, some thicker metals to improve the quality of that inductor. At these extremely low geometries, most intellectual property (IP) blocks within the FPGA don’t require this extra process. In the end, increasing the PLL’s quality factor in an FPGA becomes more expensive, thus making an FPGA’s overall process more expensive.
Further, transistor leakage becomes an issue with smaller geometries. It’s difficult enough dealing with PLL analog circuits. But when different metals and transistor leakage are factored in, the combination isn’t ideal for an effective PLL design for an FPGA.
On the other hand, if FPGA vendors decided to overcome these issues and spend more dollars on the extra process, PLLs demanding low noise are still subjected to noisy environments within the FPGA that adversely affect performance. Moreover, internal PLL outputs have to be routed out to reach various SerDes blocks around the outside footprint, which is more difficult. As more and more IP finds its way into these large FPGAs, the routing becomes a major concern. In short, these represent the issues when providing low noise PLLs as IP blocks within an FPGA.
Getting around low noise issues
The answer to these clocking issues is to take low noise PLLs externally. Figure 1 shows common application protocol CHECK AGAINST FIGURE total, random, and deterministic jitter breakout for Gigabit Ethernet, 10 Gigabit Ethernet, Serial RapidIO (SRIO), and Fiber Channel protocols. These are just a small sampling of the more common high speed interfaces.
Common communication application standards as defined in the chart typically specify a peak-to-peak (pk to pk) total jitter unit interval (UI) as a percentage of 1UI. This is a SerDes eye closure spec that must be met to meet acceptable bit error rate (BER), which is typically 10^-12 for most standards. This spec is bound by an integration range of interest (integration mask) the Standard usually defines.
Each high-speed protocol has a defined transmitter eye specification. The transmitter protocol defines a total jitter budget; this includes both deterministic and random jitter. In general, however, random jitter is the primary measure of a PLL’s quality. A highly effective PLL has extremely low random jitter.
A common application like Ethernet provides the transmitter eye specification, which is a total jitter spec. Total jitter spec includes both deterministic and random jitter as indicated above. For the most part, the bulk of the jitter coming from a well-designed PLL is random jitter, although the PLL design can also contribute to some deterministic jitter, which shows up in the form of spurs on a typical phase noise plot. In general deterministic jitter comes from a readily identifiable source on the board.
For example, it can show up in spurs, which can be the result of cross talk, power supply noise, electro-magnetic interference (EMI), and others. Each source is generally a single spur tone, but is part of the total jitter budget. It must be noted that PLL design can also limit some of the deterministic jitter, for example spurs, as a result of power supply noise, but this can be suppressed if voltage is internally regulated. Good PLL designers take these steps to improve deterministic jitter in their designs.
Since the focus here is on PLLs, special attention is given to random jitter. When specifications are defined for these high-speed protocols, a transmitter eye budget of so many picoseconds is provided. The total jitter budget is intended to meet a particular protocol. The random portion is the result of PLL technology. However, the total jitter budget not only encompasses the external PLL, but the high-speed SerDes transmitters in the endpoints (FPGA, ASIC, PHY) themselves have clock data recovery (CDR) circuitry, and that CDR is yet another PLL.
Hence, the total jitter budget the protocol provides is a function of deterministic and random jitter. But essentially it’s a function of a printed circuit board (PCB) design and two PLLs. An external PLL is clocking the inputs to the high speed interfaces on the PHY/FPGA/ASIC, and there’s also the CDR, also a PLL, that’s recovering that clock inside the PHY/FPGA/ASIC.
With these protocol-specific transmitter jitter specifications, the endpoint (PHY, ASIC, FPGA, etc.) itself defines the jitter, both random and deterministic (as spurs), that the external PLL must meet to maintain protocol jitter requirements and achieve low bit error rates. Again, total jitter budget comprises two PLLs. FPGA, ASIC, and PHY manufacturers have the CDR and know the quality of the PLL internal to their devices. They set the jitter budget for the input clock based on that quality.
Therefore, the external clock needs to be the one that’s the utmost superior. It needs to possess the lowest phase jitter because the embedded designer doesn’t have control over PLL quality in the CDRs.
Figure 1 shows what the random jitter component needs to be for the different applications, as well as an example of an endpoint requirement. Here, the total transmit specification for the protocol is broken down. Random jitter and deterministic jitter are shown in the last two columns.
The endpoint, such as an FPGA, ASIC or PHY, is the one defining what the RMS (root mean square) phase noise jitter of the external clock needs to be. Again, it is the external clock that has the lowest noise requirements. In general, because of the process limitations as defined in the beginning of this article, the PLL internal to that CDR is going to be of lower quality than what is used to clock it with.
Therefore, embedded designers using FPGAs should carefully investigate their timing and jitter requirements and the best approach for resolving them. As stated earlier, endpoints like FPGAs, ASICs and PHYs dictate jitter requirements for the reference clock coming in. Most PHY device manufacturers have a spec for the external reference clock jitter that is less than one-fourth of the line jitter budget, and some even tighter than that.
What this shows is how difficult it is to have good internal PLLs, even in a focused, custom silicon design like an external PHY. Imagine how much worse this is when the PHYs are embedded in an FPGA that could contain any mix of switching transients. So you can see the endpoint requirements for the external reference clock are always much lower than what the actual protocol calls out. Again, it’s two PLLs that comprise the total transmitter spec – one provided externally with a clock generator and one inside the FPGA, ASIC or PHY CDR block.
As illustrated in Figure 1 , using Gigabit Ethernet (Fiber) as an example, with a total peak-to-peak jitter spec of 0.21UI given for the transmitter and the UI percentage broken out for both deterministic and random portions, the embedded designer can convert the random budgeted RMS jitter by dividing using the peak to peak to RMS conversion for 10^-12 BER shown in Figure 2 and multiplying by 1 over the data rate, as shown.
Since the attention is on PLL quality, the embedded designer is mostly interested in the total random jitter requirements when selecting the appropriate solution, and these can be calculated as follows:
So in this case, the random jitter budget defined by the standard 1 Gigabit Ethernet allows for 6.25ps RMS over a 1.875 to 20 megahertz (MHz) integration mask defined by the standard. Interestingly, this number by itself doesn’t tell us what the requirements are for the external PLL. However, it defines the total requirement for the external PLL and the CDR circuitry, which is another PLL inside the device being clocked, in this case a 1 Gigabit Ethernet PHY.
In this instance, the 1 Gigabit Ethernet PHY dictates the quality of PLL required to feed the device so that the total random jitter budget of 6.25ps is met. In general, PLL quality in these CDRs isn’t going to be as good as the quality of the PLLs supplying the clocks due to reasons discussed earlier. Therefore, the more random jitter budget is allocated for the CDR PLL, the more the external clock device needs to be that much better.
High-end, low-end clocking
For example, let’s take a 10 GbEPHY at the high end of the clocking requirements. There are countlessPHYs on the market with extremely low jitter requirements. As statedearlier, the external PLL in this instance has to be the lowest noise tomeet this endpoint’s requirement.
Many PHY manufacturersspecify an extremely low specification of 400 to 500 femtoseconds (fs)phase noise requirement over a 1.875 to 20 MHz mask, a typical 10 GbpsEthernet mask. On the other hand, another PHY manufacturer specifies 400to 500 fs of phase noise over a 12k to 20 MHz mask. This is an evenlarger mask and closer to the carrier, thus it’s a more difficultrequirement to meet.
Therefore, it’s up to the clock solutionmanufacturer, like IDT, to enter the specsmanship fray to meet thoserequirements. In the case where the embedded designer is clocking a 10GbE PHY with extremely tight specifications, the approach to take isdesigning in, for example, a device featuring IDT’s extremely low phasenoise FemtoClock NG PLL technology, like a Universal FrequencyTranslator (UFT) or a FemtoClock NG clock generator with built infan-out. Depending on the application requirements, if it’s a simpleclock generator that can make use of a low frequency external crystal(XTAL) or crystal oscillator (XO) input and just needs multiple highspeed copies, the FemtoClock NG with a built-in fan-out buffer is theway to go.
If more functionality is required, such as theability to phase lock, frequency translate, and jitter attenuate anexisting on board clock source, then the Universal Frequency Translatorfamily products are the way to go; these parts offer additional featuressuch as redundancy, holdover, etc. Any device featuring IDT’sFemtoClock NG PLL technology produces results to meet these mostdemanding 10GbE PHY manufacturers jitter requirements as indicated inFigure 3.
The PN plot shows that this PLL technology meets eventhe tightest 10G endpoint specifications with enough margin for theembedded designer to feel confident the system will be robust. In thisexample, the typical 156.25MHz clock frequency defined for 10GbE comesin at 269fs over a 12kHz to 20MHz mask including spurs. This is typicalperformance of the FemtoClock NG PLL family.
Inthe case of looser clock jitter requirements, let’s take a SerialRapidIO (SRIO) Gen 1 or one GbE, for example. Here, the endpoints andPHYs designed to support these protocols have slightly more relaxedjitter specifications. These can easily be met by a clock generator thatdoes sub-1ps performance. In this case, the embedded designer can use alower power clock solution like a VersaClock 5, for example, thatspecifies sub-700 fs jitter as shown in Figure 4 .
Thisparticular PN plot shows that this PLL technology meets even thetightest 1G endpoint specifications with enough margin for the embeddeddesigner to feel confident the system will be robust with the benefit ofeven lower power. This example shows that a 100MHz clock frequencycommonly used for 1G and above applications coming in at 622fs over a12kHz to 20MHz mask including spurs is typical performance of the PLLtechnology used in the VersaClock 5.
Products like IDT’sVersaClock5 offer embedded designers versatility and much lower power.There are quite a few trade-offs in PLL design; it is very difficult todesign a PLL that has both the highest possible performance and thelowest possible power. The FemtoClock NG PLL technology supports best inclass performance, but is slightly higher power than the VersaClock 5.VersaClock 5 was designed to provide enough performance to meet all 1Gand above common protocols up to 10G (endpoint depending) and stilloffer best in class power consumption and versatility
Routing the clocks
Typicalapplications involving FPGAs and ASICs can have multiple CDRs andSerDes blocks performing – for example, gigabit Ethernet – andtypically, they aren’t always in the same place. CDRs are placed indifferent areas within the FPGA/ASIC in order to keep them isolated fromnoise generated by other IP. In many cases when designing around anFPGA/ASIC that requires gigabit Ethernet or 10 gigabit Ethernet,multiple copies of that clock may be required, one for each high speedCDR. In general that requires the generation and distribution of156.25MHz, for example, for 10 GbE.
In the case where multiplecopies of the clock are required, the embedded designer has a choice touse a clock generator device like the FemtoClock NG or the UniversalFrequency Translator or even VersaClock 5 and, depending on how manycopies of the same output frequency are needed, a low noise fan-outbuffer may also be required. In the case where the ASIC or FPGA havemultiple PHYs, the clock doesn’t go to just one place on that FPGA/ASIC.It may go to four different places and a lot of times on opposite endsof the chip.
Therefore, the designer needs four copies of thatlow noise clock. In this case, when an additional clock distributionbuffer is added between the clock generator and the endpoint (FPGA orASIC), a bit more jitter is added, and it needs to be taken intoaccount. Any logic (non-PLL) device like a fan-out buffer used todistribute clocks will add some additional jitter to clock.
Carefulconsideration must given to make sure the overall jitter budget definedby the FPGA, ASIC, or PHY is met at the input to that device. The factthat a clock distribution device could be used puts even more emphasison the quality of the PLL within the external clock generator and evenmore margin must be budgeted for the clock source itself.
Thereare a number of very low noise buffers available from IDT that limit theamount of additive jitter through these parts, like the new 1.8V8P34S1xxx family of low power LVDS buffers boasting lowest in classadditive phase jitter typically 40fs or less. In the end, the endpointjitter requirements must be satisfied regardless of the number ofbuffers in the path of the PLL and endpoint clock input.
Fred Hirning is Senior Field Applications Engineer at Integrated Device Technology (IDT). Previously, he served as digital design engineer at QuadicSystems and later as a Senior Applications Engineer for TundraSemiconductor. Fred received his BSEE from the University ofHartford. He can be reached at .