This “Product How-To” article focuses how to use a certain product in an embedded system and is written by a company representative.
Creating a hardware emulation system is no easy task. At a minimum, each generation of emulation system has to accommodate a growing number of logic gates, memory and DSP blocks to allow ASIC and ASSP SoC designers to debug their extremely complex devices before sending them off to the foundry for production. Emulation systems must also be easy to program, reliable and affordable.
Today's SoCs are exceedingly complex pieces of silicon. They contain one or more processors that will execute software. The software code they run is every bit as important a part of the final system as the silicon itself.
The software and the silicon have to act as a seamless solution; if there's a problem, it might be the software, or it might be the silicon. Designers can only do so much software testing on a development host. No reasonable host development system can reflect the true parallelism of the target SoC.
You can really only test out such issues as synchronization, data integrity and resource contention in situ, and that's far too late to identify problems. Simulation isn't a viable solution; it's simply too slow to allow the execution of any realistic code.
As a result, engineers have been using emulation systems for well over two decades to verify the most advanced ICs the semiconductor industry can build.
Most of these earlier-generation emulation systems were powered by custom ICs that the emulator vendors designed themselves. They would then pass the cost of the custom IC development on to their customers, making the power of emulation more cost prohibitive for companies struggling with tighter IC development budgets.
In 2001, EVE broke with tradition by basing its emulation system on Xilinx FPGAs. The goal was to provide the lowest hardware-assisted verification cost of ownership in the industry, as achieved through a combination of high execution speed, high capacity (which today means up to a billion gates), quick design revision, flexible and powerful debugging capabilities, lowest cost per gate and most cycles per dollar.
In addition, we wanted to make the system easy to use for ASIC designers who might not be familiar with FPGA design. The result was the ZeBu (for “zero bug”) emulation system (Figure 1 below ). We've now developed six generations of emulators, the most recent of which is ZeBu Server.
|Figure 1: EVE bases its ZeBu emulation system around Xilinx FPGAs.|
Separation of powers
Our approach is to split the device under test (DUT) from an interface to the test environment that we call Reconfigurable TestBench (RTB). The RTB allows for test con- figuration and control.
The DUT will change with each rev of the design, but the RTB never changes unless the test environment does. Having a single mass of FPGAs containing a mix of the RTB and DUT designs would have been messy and required unnecessary recompilation of the RTB design, so we separated them out.
As a result, we have one set of FPGAs for the DUT and another set for the RTB (Figure 2 below ). The number of DUT FPGAs varies by system size; bigger and smaller systems are available for bigger and smaller designs.
|Figure 2: ZeBu uses one set of FPGAs for the device under test and another for the RTB.|
The RTB FPGAs provide communication and control. We can optimize them much more aggressively for performance and capacity, since the design will remain constant. While no one reconfigures the contents of this set of FPGAs, the design allows a rich set of configurations of the test setup under user control.
For the DUT FPGAs, the overwhelming priority in terms of required features was density. We needed to be able to put really big designs in here to provide value. Key among the necessary hardware features were multipliers, which later gave way to DSP blocks. In addition, big designs needed big memory, so the largest possible memory blocks were a requirement.
Close behind density on our “must have” list were bandwidth and latency. The data transmission between the FPGAs had to be fast enough to keep from becoming a serious bottleneck. We didn't plan to use clock-data-recovery (CDR) circuits—in fact, they weren't even available when we started out—but we did use high-speed LVDS I/Os, layering our own proprietary low-latency protocol over them.
To provide robust debugging capabilities, we needed readback of internal states. We wanted a way of interrogating what was going on inside so that designers could understand the inner workings of their technology—especially when it didn't seem to be working. We might implement readback using JTAG or some other means, but it had to be there.
Finally, to keep the challenge of implementing a single design across multiple FPGAs tractable, we wanted to use only a single technology on the board and across different members of the emulator family. So the FPGA family we chose had to have a broad range of densities and pin counts.
Our considerations for the RTB FPGAs were different. Again, one of the primary requirements was keeping to one FPGA technology per system. This would ensure that all of the FPGAs were in sync with respect to version and availability. Of course, we still needed to be able to meet our performance requirements, no easy task given that the RTB FPGAs run at twice the speed of the DUT FPGAs.
Given the sum of requirements, the Virtex-II family was our choice for the first ZeBu generation. Our first system used two Virtex-II 8000s for the DUT and one Virtex-II 6000 for the RTB.
Going with a flow
We design the RTB FPGAs with a standard FPGA flow, using XST for synthesis and the standard Xilinx ISE tools for place and route.
The effort we make here is typical of what anyone designing with FPGAs might do to create a complex design. The differentiation isn't in the flow; it's in the hard work and our system knowledge.
When it comes to the DUT flow, however, our challenge is far greater than the one the typical FPGA designer faces. By definition, we deliver an unfinished system in the same way that Xilinx delivers an unfinished chip. Our customers have to fill in the DUT design details. So we must provide them with a way to implement their designs—while assuming that they may not know or care about how to do FPGA design. To help facilitate this, we created the ZeBu Compilation User Interface (zCUI).
A key concern is the fact that the DUT design, by definition, will change numerous times throughout the design cycle. Minimizing the design turn time has always been a high priority. Because the first systems were relatively small, we didn't focus on the compiler performance; we simply worked through the standard Xilinx flow.
But as we moved up in density, compile time—and, in particular, synthesis time— became more critical. A design that will occupy 400 of Xilinx's largest FPGAs is not trivial to synthesize.
We were concerned that standard synthesis tools were spending too much time doing too good a job. Our priority was not optimal synthesis—optimal meaning highest performance, lowest gate count. Those are good things, but for us, compile time was a higher priority, and we could live with 20 percent worse performance or density—especially since we're executing at less that 30MHz.
For this reason, we developed our own zFAST synthesis tool, which could take SystemVerilog, VHDL or Verilog designs and synthesize them as much as 10 times more quickly than standard synthesis products.
We now make zFAST available alongside the more traditional synthesis tools so that our customers can invoke it for large designs as needed. Once the design is synthesized, our own partitioning tool partitions it at the gate level.
It should be noted that some of our users are accomplished FPGA designers, and we willingly hand over greater control if they want it. The customer is free to use the standard FPGA flow to try to get more performance out of the DUT, constraining and using any tricks they know as necessary. So the system makes it easy for a non-FPGA designer, but does not restrain an expert.
The continuing performance improvements of the ISE tools from Xilinx have also helped our overall compile times. EVE was one of the first users of ISE 11, allowing us to keep our FPGA compile times typically under 2hrs per FPGA. Parallel compilation means that we can turn large designs with multiple FPGAs in a reasonable amount of time. Our overall flow is summarized in Figure 3 below .
|Figure 3: The ZeBu design flow makes use of Xilinx's ISE tools|
Trust, but verify
Verification is a bit tough for us, specifically because we don't ship a finished design. We have to be confident that, once our customer compiles a design onto the ZeBu platform, everything will work as promised.
While the verification of our first set of RTB FPGAs was a challenge, since then we've been able to use our own existing ZeBu systems to verify the RTB for the next-generation ZeBu system. We run the new RTB for hours and days, executing billions of cycles, to confirm that it's solid.
For the DUT FPGAs, in conjunction with our own tools we have a large vault of designs accumulated over years of experience. Each night, we run thousands of designs and test the results using application-specific test bench suites associated with each individual design, as well as through cycle-by-cycle simulation comparisons that include internal state as well as inputs and outputs.
We also check to make sure that our results remain deterministic as our tools and those from Xilinx evolve. Whenever there's a revision change, we run through the suite of tests to ensure that the results we get with the new version are the same as those we got on the older version.
Ludovic Larzul is VP of Engineering at EVE. He can be contacted at email@example.com .