Net-centricity is having a profound impact on the way programmers do things, particularly in the embedded space. The challenges they face and their use of debug and testing tools are changing.
How to effectively debug and test remote systems in the networking and data communications environment is an important emerging issue as the popularity of the Internet spurs the growth in the number of servers, routers, switches. It's likely that there are not going to be enough engineers and technicians to go to sites with problems, much less deal with them in real time. Moving debug and test tools out of the development lab and into the field may be the only answer.
The issue got personal for me when I became an early beta user of a cable modem Internet access system. Initially there were problems making the analog cable video signals coexist with the digital signals coming from the Internet. Once solved, there were still reliability problems: no access to the servers; variable web download times,and so on.
Sometimes, the delay just had to do with the time of day and how busy a particular server was: I had traded waiting on a freeway in an automobile traffic jam for waiting by my desktop for the data traffic jams to clear on the information superhighway. But just as often the problems were in the routers and switches between the web site and me.
The trace program I was using listed information about each node in my path. In my conversations with the engineers at these sites I found there were only two or three ways they “fixed” a router or switch. All of them, to my mind, are inadequate for the demands of today's Internet.
The first was to send an engineer to the location of the failure (if they could find it) and simply pull the board and replace it with another known “good” board, hoping that the problem would not repeat itself. The second way, deployed early on by many major networking and telecom companies, was to create remote management systems based on SNMP or HTTP and use Java servlets to gather information about the performance of a particular node. Using such tools, operators could evaluate performance and anticipate when and where a system would go down, allowing them to send a repair person and pull the board or boards before the system went down, or to be on site gathering information if it did. The third way, an expensive way, was to take advantage of the ability of some operating systems to be instrumented with tracing tools available from any number of embedded tool companies to collect data and store it for analysis if the system went down.
None of these approaches, I feel, allow an operator to determine for sure what the problem is when it occurs, in real time, at any location on the network. This will be absolutely necessary as the Internet truly becomes the worldwide “information superhighway,” where much of the planet's commerce and transactions will be performed and where the average consumer will cruise every day for information. The current approach to instrument the code locally and collect information locally still requires that someone go to the location and download the data. With millions of routers, switches, and servers in the near future of the Internet, such manual approaches are clearly outmoded.
Solutions are beginning to emerge. For example, many network and telecom providers are instrumenting the code on selected boards to collect information. They store it locally and then deliver it over the Internet to the system engineers for analysis. But more and more, this kind of information will have to be supplied from all systems in the network in real time at the moment problems start to occur. There also should be a way to debug the code remotely, instrument the code with subroutines that collect data and remotely upload fixes as well.
Network and telecom vendors who have implemented Java-based management and monitoring systems can take advantage of the Java Debug Wire Protocol (JDWP), which connects the run-time environment to a debugger user interface running on another computer. At present this capability allows various JDWP-enabled integrated development environments, such as Metrowerks' Code Warrior and Object Technology's Visual Age Micro Edition, to debug Java applications running on any Java virtual machine anywhere on the Internet that supports this protocol.
In the more general C/C++ development space, a number of developments offer the promise of even more remote debug and instrumentation capabilities. RTView Ltd., for example, has developed a remote debug management capability it calls SurroundSupport. Built around the company's core product, SurroundView, SurroundSupport is a diagnostic and e-support infrastructure for connected embedded devices. It facilitates remote monitoring, logging and debugging not only in the lab, but during beta testing and after full deployment in the field. It works by allowing the engineer to insert agents and a number of test points into an application's source code that allow for monitoring, fault diagnostics, preemptive warnings and debugging.
Other tool vendors with debug and code instrumentation capabilities are also beginning to cooperate with RTOS vendors whose kernels support instrumentation. QNX Systems Software Ltd. and Applied Microsystems Corp. are examples of such companies.
How far can we go or do we want to go? Are there other methodologies that would enhance system developers' ability to remotely test and debug at even deeper levels, and is it necessary? One possibility would be to adapt the on-chip debug capabilities that several companies, such as AMD, ARM, Intel, MIPS, and Motorola, offer. Such on-chip interfaces were originally created to give developers more visibility into the new generation of processors and system on-chip designs. But is there any reason that they could not be used for remote debugging? Another even better possibility would be to adapt the proposed IEEE 5001 debug interface specification, originally the Nexus Global Embedded Processor Debug Interface Standard (GEPDIS).
I know that these debug port alternatives were not designed for remote test and debugging. But when has that stopped embedded developers? Of all the various engineering types, embedded engineers are the most pragmatic. If the ideal tool is not at hand, they use the ones available, even if they were not designed for the job originally. It is not as if this would be the first time the industry has gone the pragmatic route. For example, most of these various advanced debug ports developed to give software developers more visibility into the chip are extensions of the IEEE JTAG standard, which is the scan test bus interface for analyzing the silicon innards of complex CPUs.
What do you think? If you are a developer, what tools are you using to achieve remote test and debug? If you are an engineer employed by a tool vendor, what is your company doing to offer this capability? Are there other existing tools and methodologies that already deal with these kinds of situations? Are there other issues and problems related to debugging and testing in a connected environment that I did not cover? I'd like to hear from you.