Software Debug Options on ASIC Cores -

Software Debug Options on ASIC Cores


Software Debug Options on ASIC Cores

Thinking about using ASIC technology to integrate your system's electronics onto a single chip? Not so fast. Now that electronics are reaching the level of integration that can be termed “systems on a chip,” how will you debug software on CPUs that are cores embedded into chips?

Do you remember the popular parlor game of old, where objects were placed in a box, and the participants were then asked to grope around blindly in the box and try to identify the objects? What I remember most about this game is that an imaginative host could make the identification process difficult by selecting common objects that could easily be mistaken for items that were disturbingly embarrassing.

This game is an apt analogy for the debugging process we have been accustomed to performing with embedded systems software. We have an imperfect window into the operation of our systems, and every time we come up with a new way of gaining visibility, the CPU or system designers come up with a new feature that buries the information we need more deeply into the system.

Carrying the analogy into systems based on Application-Specific Integrated Circuit (ASIC) technology requires us to imagine playing the same game with the box covered in wrapping paper, sealed with duct tape, and maybe even buried in the back yard. An exaggeration? Maybe just a slight one. Read on, and I will show you some truly scary aspects of working with this technology.

What is an ASIC? ASICs are quite simply the next level of integration in system design. The chips contain an array of hardware logic devices that are programmed by the system designer with specific behavior. ASICs have been popular for years as a way of assimilating “glue” logic into a single device on a board, but lately they've become popular in terms of finally achieving the long-sought level of complexity, termed “system on a chip,” in which the logic for an entire board design gets collapsed into a single chip. A relatively new idea along these lines is to include a CPU in these designs. Many popular standard CPU architectures such as ARM and MIPS are available in hardware description language library form (see sidebar, “What is an HDL?” by Lindsey Vereen on page 38), meaning the architectures can be integrated with memory and I/O devices to create a custom implementation for a particular application. The advantages of this approach all point to lower overall cost for systems that are produced in relatively high volumes. Manufacturing quality is better because there are fewer interconnections to fail on the board. Speed and reliability are also much higher because the signal paths within a chip are much shorter than those between chips. All in all, this approach is a good way to lower the cost and increase the reliability of a system design. The concept is illustrated in Figure 1, albeit in a somewhat abstract way.

Figure 1: ASIC Block Diagram

The bad news But what are the ramifications of this technology for software engineers? The simple fact is that the address and data signals on which many of our tools depend may not be available outside of the chip itself. For example, the typical in-circuit emulator connection, shown in Figure 2, assumes that the emulator will have access to the address and data lines for the CPU being emulated. If these signals are not available, an entire class of debugging devices becomes equally unavailable. The same holds true for logic analyzers or any other tool that relies on physical access to address or data signals from the CPU.

In fact, even if the particular ASIC designs in your system do have the information available, you may find yourself without many debug options. ICE probes are usually built to conform to the specific pinout of a microprocessor, and if the pinout or timing of an ASIC-based CPU core doesn't match that specification exactly, then the ICE will probably prove to be incompatible. But is this really a big problem for anyone except companies that sell these debug tools? Are there alternatives that can be used, ways for the business of debugging to go on as it has before? We will be looking at some of these alternatives and at some possibilities for alternatives in the near future.

Figure 2: Traditional ICE connection

Option #1: Pure software The easiest (and therefore probably the most suspect) way around these access problems is the use of a debug kernel, a piece of software that allows debug operations to be performed on a target system in conjunction with a debug host system. The kernel itself is usually a relatively simple piece of code that allows the debug host access to memory and handles debug breakpoints.

This approach has always been popular with the budget-challenged end of the embedded software world. If you have a relatively simple application and a tightly constrained budget, a software debug solution will probably fit your needs. The only real fly in this ointment is the extra memory required for the debug code. On a board-based system, it is relatively simple to slap a little more memory than the application requires onto a test board to satisfy software debug requirements, but it's a little more complicated when your system design lives within an ASIC.

A couple of ways around this problem exist. The most obvious one is to avoid putting the RAM and ROM for your system into the ASIC design. This kind of partitioning certainly helps in terms of debugging the system, but it may easily drive up the overall cost of your system. In other words, this approach may make it easier to build a system that won't sell because it costs too much—generally not a satisfactory tradeoff in a competitive environment. The other alternative may be less painful. We've seen a movement lately towards building the equivalent of this debug kernel into the design of the CPU itself in the form of a set of microcoded functions. This alternative has the advantage of always being available to the system, so problems encountered after the system has been deployed can be debugged in the field. Motorola has pioneered this approach with background debug mode (BDM) additions to the 683XX CPU line, and the company has recently moved it into the 68HC11 line as well. This method has real potential to make programmer's lives easier.

Now, BDM is fine if you happen to be working with a Motorola CPU, but the fact is, most ASIC designs that use licensed CPU cores are probably working with an ARM or MIPS architecture. Both of these have been marketed much more aggressively than Intel, Motorola, or other CPU designs that enjoy high volume sales at the chip level. And the design may very well be based around a custom microcontroller. Developing debug capability just to make your programmer's lives a little easier during debug and system integration may very well be too high a price to pay in such an effort.

What is needed in this case is a more general-purpose capability. If a debug addition can be made to serve a dual purpose, then it can justify a larger part of the attention of the hardware designer and more of the resources of the ASIC itself. This approach is taken by the addition of JTAG debug capability.

The JTAG option JTAG is an intriguing technology. Named for its developer, the Joint Test Action Group, it started out strictly as a specification for performing boundary-scan hardware testing on chips. JTAG is a full-fledged standard (IEEE–1149.1), which specifies the details of access via a 4-pin interface that implements control signals and a bidirectional serial data path. It allows access to registers defined within a chip, which is very handy for performing powerup or extensive debug diagnostics on a system.

The interesting part for us comes in because some test equipment and ASIC cell companies have put in extensions that use the JTAG capability to implement software debug functions. With the proper support built into a target CPU, this interface can be used to download code, execute it, and examine register and memory values. These functions cover the vast majority of the low-level functionality of a typical debugger. Attach it to a workstation or PC running a debugger that displays source code, interprets raw addresses symbolically, and generally looks pretty and you have a debug solution that will go a long way towards helping programmers do their job.

This option has been picked up by a lot of manufacturers. As with most good ideas, a bit of a land mine of intellectual property rights is involved. Texas Instruments patented its version, developed for the TMS320 DSP line, as U.S. Patent #5,329471 titled “Emulation Devices, Systems and Methods Utilizing State Machines.” Macrochip Research has a cloned technology that it claims doesn't infringe on TI's patent but does the same thing as TI's port except where it is better. Check out one side of the details at Why is this controversy important to engineers? For starters, if the lawyers are being called in to fight over an idea, it's either an indication that it's a good idea or that things are slow at the local hospital emergency rooms. I suspect the former in this case. Microchip claims to have an implementation that gives impressive debug capabilities with as few as 1,500 logic gates of the ASIC dedicated to the function. So is JTAG the answer to all our debugging prayers? If you think so, allow me to dash your hopes. This technology will effectively handle about 90% of the problems that will turn up in an embedded system software project debug phase, but the other 10% are the problems that can stop a project cold in its tracks. The problem is that the window into the CPU is not wide enough to access detailed information on software execution without affecting that operation. What do I mean by this? Let me give you an example. Suppose you are working on a multiple-processor system. This system has to process data quickly in a parallel format, with parts of the data being directed by a master CPU to multiple slave CPUs through fast FIFO hardware. Suppose that the system is working fine except for the fact that about once a day or so it just locks up, requiring a complete reboot to get back up and running.

This situation is not hypothetical. This problem actually happened on a project I was associated with, delaying shipment of an expensive system by about six months until the answer was found. This design was based on non-ASIC CPU chips with complete access to the external address and data lines. Unfortunately, the problem would only show up when the system was running out of the on-chip cache on these CPUs. The point is that timing-related problems can be difficult to debug, especially on high-performance systems. The only sure way to solve these problems is to painstakingly examine execution traces from occurrences of the problem for direct evidence of exactly what is happening. If this information is not available because it is deeply embedded in an ASIC, programmers are left with little to supplement any intuitive insight they have into the system operation.

The basic weakness of the approaches discussed so far is that they lack real-time trace access to the system operation. Two methods for gaining access to this information exist, but these methods are much more difficult to implement than the techniques discussed so far. Debug information port

The first of these choices is to add hardware to collect the information and a path for this information to get out to the programmer. This balancing act is difficult, because the choice is to add a lot of buffer memory to the ASIC itself or to add a high-speed path to the outside. The tradeoff quickly slides to the latter when you start adding up how much of the ASIC would have to be dedicated to otherwise-unproductive debug information storage. But this is not an easy path either. One of the major advantages to ASIC-based design is the ability to integrate I/O devices into the chip. These devices may quickly eat up the pins on the ASIC, leaving no room for a parallel debug data path. The lack of pins is a familiar problem in chip design, but some current design options may exacerbate the problem even more. For example, it is possible to create multi-chip modules, IC substrates containing more than one silicon die. The Pentium Pro is an example. Some of the pins for each die must interconnect to other dies, leaving fewer connections available to pads leading to the outside world. A debug port is probably only possible as a serial interface, using as few pins as possible. Well, we already have a serial port specified with JTAG. In fact, this port can be useful for export of tracepoint hits or other relatively low-speed data. But if there is a burst of data that exceeds the data rate of the JTAG port, it must either be buffered or lost. The possibilities for relieving this bottleneck each have their costs. Data compression could be used quite effectively, but the hardware to implement the algorithm must be added to the ASIC. A wider port could be used on the data path, but the pad limits quickly get reached. The truth is that there are probably more questions than answers in this area as of this writing. I spoke to engineers at one company while researching this article who claimed to have an approach under study, but they were cautious about disclosing details until they were further along. The source estimated that the earliest they could produce something for public discussion would be the second quarter of 1997. As of right now, the bottom line on a direct hardware approach is that you are pretty much on your own. If the JTAG approach gives adequate bandwidth for your needs, it is probably the most mature. If your ASIC designer has ideas, listen to them. Those ideas may be just as good as anything otherwise available.

The simulation approach I said there was a second approach to detailed real-time debugging, and indeed there is. But first, we need to cover one little detail. Until now I have been assuming that the debug takes place on code that is operating out of RAM built into the ASIC—a very dangerous assumption for many systems. The typical mode of operation for most small and medium-sized jobs is to run code out of masked ROM. Needless to say, this is a major problem for an ASIC-based system without the capability to add an external EEPROM.

If the last paragraph was a shock to you, welcome to the world of ASICs. If you were wondering when I was going to get around to mentioning that, you have probably already gone through at least part of a project. The fact that an internal ROM mask may be part of the actual chip design is one of those details that hardware engineers sometimes forget to mention to their software partners, and it can be an ugly one. I haven't mentioned it until now because none of the approaches discussed until this point adequately addresses this problem. In a system where the hardware and software get shoved into the chip at the same time, the only real option is to provide an environment that can be changed at will and very quickly by both software and hardware engineers. You must either do initial testing on something like a reprogrammable FPGA or provide a simulated test platform.

The field-programmable gate array approach is becoming more possible every day, but its basic problem is the difference in timing between FPGA parts and their ASIC counterparts. It is quite possible to debug a design as an FPGA and have to redo the effort when the equivalent design is burned into an ASIC. This situation is similar to debugging C code under one environment and then cross-compiling the code into another.

The best answer is to have an accurate model of the system in a simulated environment. This model allows full visibility into operation at a gate-by-gate level, if necessary, and avoids expensive hardware respins. This approach is taken by several companies in an effort to build hardware/software codesign tools (see “Trends in Hardware/Software Codesign,” ESP, January 1996, pp. 36–45). These tools are slowly becoming a reasonable alternative to the “build it and see if it smokes” school of debugging, but I would have to see a lot more before I could recommend them for general use. Unfortunately, these tools represent the only reasonable approach for some environments. Here we find ourselves on the bleeding edge, where the demands are great and the alternatives are few.

Effort vs. Benefits In a sense, this is a very depressing state of affairs. Debugging ASIC-embedded code entails some very real problems, and, in most cases, we are the first to be facing these problems. The effort involved may outstrip the benefit to be gained by going to an ASIC-based design. On the other hand, real benefits can be gained. Increased integration has well-known benefits in reliability and decreased manufacturing costs, and I suspect that the engineers felt the same way when they first confronted a design based on integrated circuits. Here was a part of the system that they had very little visibility into that had to be trusted to perform a large part of the operation of their system.

Programmers have faced similar challenges also. I can remember my first embedded project, a data recorder built around an M6800 CPU. We had an EPROM burner and a $20,000 development system that was a piece of junk. But the system we built was far more advanced than the paper-output recorders that were available at the time.

ASIC designs are at that stage today. They will allow levels of integration that have been impossible until now, but that means the designs will be more complex and harder to debug than any in the past. When it comes right down to it, the only real debugging tool is an engineer that understands what the system is supposed to be doing. The tools and techniques discussed here simply provide that engineer with a means of measuring what the system is really doing as compared to what it should be doing. This measure will allow the best engineers to debug your system but will not take the place of that engineer. The only real debug tool is between an engineer's ears.

Larry Mittag, contributing editor for ESP, can be reached at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.