Andre Yew describes the history trace debug and describes the evolution of High-Speed Serial Trace (HSST) and discusses how it replaces conventional parallel trace, especially as CPU speeds and System-on-Chip integration complexity increase.
What HSST brings to the game
Just as demand for increased bandwidth in other technologies has driven
their transmission channels to high-speed serial channels, trace is on
the verge of replacing fast, wide parallel channels, with significantly
faster, fewer serial channels.
Hard drives have switched to Serial ATA, and consumer
high-definition video basically requires HDMI,
both of which use similar transmission protocols as the various
high-speed serial trace proposals.
Increasing bandwidth makes high-speed parallel protocols more
expensive and difficult to implement. For example, interchannel skew is
difficult to control across 20 fast channels, and requires expensive
cabling to guarantee performance.
Most trace collection probes today use a micro-coaxial ribbon cable
from Precision Interconnect, which we buy for well over $100 for modest
quantities of very short lengths.
As speeds increase, crosstalk between channels of a parallel
interface also increases. Again, we use heroic $100-per-foot cable to
solve this as well as adding even more conductors for ground lines
between each signal line.
Switching transient current draw for many high-speed lines is
enormous, and causes ground bounce due the finite resistance of
conductors. These transients cause glitches that corrupt data. We
inadvertently encountered this phenomenon during the development of the
SuperTrace probe, a high-speed 1 GB trace collection probe.
We discovered that during certain operations, very infrequently, we
would get corrupted data. After spending a few days trying to figure
out what was going on, we finally realized that our highest-speed logic
had been placed into a corner of the FPGA that had the fewest ground
pins.
After re-routing the design for a ground-rich corner of the FPGA, we
no longer had data corruption. As CPU speeds and parallel trace port
speeds increase, problems like this will only become more common, and
more difficult to solve.
More important for ASIC designers is the large number of pins
required by parallel trace. While 20 pins may give the best performance
from an ARM trace module, designers can barely afford less than half of
those number of pins, which can significantly hamstring the performance
of the trace port. With an abridged trace port, you may be lucky to get
uninterrupted trace of the program counter, and data trace may be
impossible.
Developers are forced into an impossible dilemma: do we give up
enough pins so the chip will fit and meet its budget, or do we give the
software developers (who are often the bottleneck of any electronic
product) good enough trace facilities, so the product isn't held back
from production for months by obscure bugs?
HSST solves bandwidth by using fewer channels, but running them far
faster. Fewer channels means fewer pins, and lower power requirements.
Because the data is wrapped into a serial channel, each with its own
embedded clock, interchannel skew is no longer a problem, and noise
susceptibility and emissions, both important for complying with EMI
standards, are greatly reduced.
If more than one high-speed serial channel is used, skew still isn't
a problem because multiple serial channels can be bonded to guarantee
certain skew specifications.
Serial channels also use some kind of encoding scheme to balance DC
and to provide enough transitions for clock recovery. The so-called
8b10b encoding used in Gigabit Ethernet, for example, where 8 bits are
encoded to 10 bits in order to equalize the time the wires spend at 1
and 0, is currently the front-runner for HSST. However, 8b10b encoding
incurs a 20 percent bandwidth overhead, so a 4 Gigabit-per-second
channel has 3.2 Gb/sec of useful bandwidth.
Serial channels under consideration include Xilinx's RocketIO, which
can go as fast as 6.25 Gb/sec. Current discussions with various
customers, vendors and standards committees include proposals for using
4 of these channels for an aggregate bandwidth of 25 Gbit/sec, which we
believe will cover almost all trace needs for at least a few years. For
comparison, the highest bandwidth parallel trace ports currently in use
are less than 8 Gbit/sec.