CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Real-Time Debugging for Highly Integrated Embedded Wireless Devices



TechOnline

The good news is that we can now cram a lot of functionality into our system-on-chips, including multiple processors. The bad news is that we still have to debug our hardware/software designs and make them work. The really good news is that we have the test hardware and software to do the job, even for MP systems. This debugability is especially critical for wireless portable devices, which may have multiple processors, generally a DSP and a uC.

SoC design methodologies for programmable cores now include static debug blocks, which may be used during the early stages of product development. By including additional debug related capability, on-chip suppliers are offering designers the ability to fully understand the behavior of a given system, including validation of both hardware and software architectures, and their interdependence. This is essential for evaluating real-time power consumption in Internet-ready handheld devices.

Wireless SoCs also introduce the need to develop and debug power efficient handheld wireless devices, and the market's need for rapid product development. Debugging highly integrated multiple core systems on a single chip can be done if the necessary hardware and software resources are available.

A good example of SoC debuggability is the Motorola 3G Baseband Transceiver, an SoC that implements a handset design by integrating a M•CORE M341 micro-RISC core and a StarCore DSP core onto a single chip. This SoC also implements a real-time debug port based on the IEEE Industry Standards and Technology Organization (ISTO) Nexus 5001 Forum specification.

The IEEE-ISTO Nexus 5001 Forum is defining a common set of microcontroller on-chip debug features, protocols, pins, and interfaces to external tools and tool sets that address the static debug requirements of embedded real-time architectures. The Nexus standard defines a scalable set of features that enable existing debug blocks to be accessed via an extensible auxiliary port. The features associated with this new auxiliary port focus on the required real-time transfer of information to and from the embedded microcontroller.


The Design—Power Profiles
Designing a low power/high performance system, such as a cellular handset, is no mean feat. These are complex designs that must live within critical power and performance envelopes.

Figure 1: A digital cellular handset can be partitioned into three main sections. The RF section receives and transmits analog and/or digital information; the Analog Baseband and Control section handles intermediate frequency conversion, user interaction and power control; and the Power Management section distributes and manages power to all elements of the handset.

In first- and second-generation digital cellular solutions, overall baseband power consumption is derived from the combination of:

  1. Standby leakage power
  2. Active power for time-based protocol software stacks and data (voice) transmission
  3. System event power induced by an active page or call, or other user-induced event as illustrated in Figure 2.

Figure 2: Characteristic Power Consumption for Cellular Handset

The relative periods of standby and active power can be calculated with decent accuracy based on knowledge of the wireless protocol. Standby power consumption can hence be estimated via leakage current information for a given technology, and knowledge of the amount of time the chip stays in this inactive mode.

Active power consumption is a bit more difficult to estimate, but for repetitive software stacks performing known protocol functions, this too can be relatively accurately determined. Consider then the problem of estimating and optimizing on-chip power consumption during user-induced events such as Wireless Application Protocol (WAP) browsing, high speed down/up link transactions, and Motion Picture and Entertainment Group (MPEG4) structured audio activity.

The embedded system contains all the necessary capability to perform these functions, even in parallel with other events, but their behavior is much less deterministic. Software written to handle this multitude of system activity must be carefully optimized to improve overall battery life for a particular application.

Task Digital Power Analog Power RF Power
Network Access 40-ma 20-ma 40-ma
Call Service 20/30-ma 20-ma 50-ma
3G Playback 35-ma 15-ma 50-ma

Table 1: Prior studies show that the three main blocks of the cellular handset each consume from 15- to 50-ma of current depending on its state of activity.


Real-Time Performance Analysis
Lab bench analysis of prototype systems permits conventional methods of evaluation such as circuit boards with logic analyzer interfaces. Typically these boards provide a means for initial power up and integration of software and hardware modules.

Each core in the baseboard and processor chip is evaluated individually in a static debug form. They are each put in a special mode of operation and their programmer model registers and memory are checked while single stepping a test program that was downloaded from a host computer. Once the system passes the "smoke test" where each processor exits reset and performs initialization functions correctly, the task of debugging real-time kernel and interrupt structures begins.

This type of debugging for a real-time wireless device required a logic analyzer to monitor the external bus interface at selected clock cycles and to sample the bus activity for recording and later analysis. Unfortunately, this approach is becoming extremely expensive and even physically impossible as microcontroller and DSP speeds move above 100-MHz. And it's even worse for SoCs with on-chip microcontrollers, DSP, and memory—there is no place where the logic analyzer can be attached to monitor the on-chip processor activity.

Traditionally, when the bus interfaces are not available, developers embedded "printf" statements in their code at strategic points so that the data needed is sent to a peripheral port and retrieved by a host processor. In this approach, the information captured is minimal and the technique inserts intrusive delays in the application. And this cost is unacceptably time consuming, especially as system software layers become more complex. An on-chip hardware/software approach is needed that enables the on-chip processors to be monitored and those results to be transferred to the host for analysis.


Cellular Performance Considerations
But debugging effective cellular phones requires more than just simple software debugging. The ability to track the performance and power usage of the design are also needed. Moreover these are not simple one processor designs. Typically, cellular handsets have multiple processors, usually a microcontroller for control functions and a DSP for signal processing. The debugging resources need to be able to track both processors and their coordination, as well as to track the on-chip peripherals. Thus cellular handsets require specialized debugging resources tailored to the specifics of handset design.

The heart of the cellular handset is the baseband transceiver, which performs all computations relative to call service, Internet Web interaction, and handset control.

Figure 3: A block diagram of a Motorola wireless baseband processor, including separate MCU and DSP core complexes, interfaced to separate on-chip RAM and ROM memories and core-specific peripheral and I/O functions.

One way to fully track each processor cores' operation separately, as well as its interaction with the other, would be to pin out the internal core buses to external pads, thus achieving good visibility of both cores' bus cycles. However, this approach is simply too costly, as cellular SoCs need to minimize I/O and package costs. Nonetheless, system hardware and software architects still need a method of understanding standalone and integrated core processor behavior.


WAP Debug Hardware Requirements
There is a solution, a standard solution at that. The IEEE-ISTO Nexus 5001 Specification defines the mechanisms needed for debugging an Internet-ready handset. The WAP architectural specification focuses on optimizing for efficient use of device resources. But the task of providing a communications protocol as well as an Internet protocol layer dictates that the RAM required will be 1- to 4-MB and the Flash ROM, which will hold the kernel, will be on the order of 256- to 512-KB.

Since the number of external accesses to RAM directly affects power consumption, the microcontroller engine must have an efficient instruction set and resident cache and Memory Management Unit (MMU) to reduce external bus transactions as illustrated in Figure 4. A key power consumption goal is to write the handset code so that it efficiently uses the cache.

Figure 4: M•CORE M341 Processor with Nexus 5001 Debug Port

Once the cache and MMU are enabled, the interaction of the core with the cache is no longer visible to the user unless there is a cacheable instruction or data miss resulting in an external access to fill the cache. This problem is aggravated further when developers must debug code that exhibits abnormal behavior in real-time or there is a need to capture power measurements when running specific code.

A good example of a handset SoC design that implements a Nexus 5001 port for debugging is the M•CORE M341 microcontroller core, the microcontroller part of the handset design. The same techniques are used for the DSP processor as well. The M•CORE's Nexus 5001 port supports accessing user resources using a high speed output port to transmit real-time program and data information. The feature set of the Nexus 5001 port is of class 3, providing static debug capability and real-time process identification, program trace, data trace, and read/write access to M•CORE Local Bus (MLB) resources. Of course, using a 2- to 8-bit output port for reporting real-time 32-bit address and data values requires an efficient data transmission method.


Using Public Messages
A set of data packets commonly referred to as Public Messages in the Nexus 5001 specification has been defined for the efficient transfer of debug information between the embedded processor and a development system. Public Messages consist of a transfer code or TCODE, source processor identification number, and the data associated with the particular feature being accomplished. A key requirement in the definition of the Public Messages is efficiency; thus packets may be variable in length depending on the TCODE.

Messaging capability is controlled by a JTAG serial interface. The JTAG interface couples to a OnCE static debug block and provides access to all Nexus 5001 registers on the M•CORE M341 processor. Messaging capability is enabled prior to deassertion of the reset pin so exit from reset may be monitored.


Monitoring Program Flow
Following program behavior can be reduced to the changes in the program counter due to branching, jumping to subroutines, and servicing interrupts and exceptions. Analysis shows that on average 12-13% of instructions executed in a program are of change of flow nature. Therefore, it is not necessary to report every instruction's address but rather only report the change of flow. What is needed to follow the source listing is where you are relative to a reference start address, and where you are going when you change program flow.

Three types of public messages provide program flow behavior. Real-time operating system (OS) debugs must have a means for reporting a process ownership identifier. The objective of the Ownership Trace Message is to give the most current value of the data bus when a process writes to a special address. This address is called the User Base Address where comparators on the Nexus 5001 snoop logic triggers a capture of the data bus. Thus, whenever a context switch of the OS occurs, a process identifier may be transmitted using Ownership Trace message. This may be key for correlating virtual to physical address maps of the MMU when sending messages to the source level debugger.

Branch Trace Messages report when direct branch or indirect branch instructions are executed. The difference in the messages is that in a direct branch occurrence, the only information needed is the number of instructions executed since the last change of flow. A reference address using a Sync Message is normally transmitted to establish where the program counter currently is. After that, all references are made to that address until an indirect change of flow occurs. This reduces the number of bits transmitted in a message. Indirect branch messages report the number of instructions executed since the last change of flow and the address where the program counter is jumping to, thus establishing a new reference address.

If you need to report specific memory accesses, the Watchpoint Message does the job. This message triggers off hardware comparators and complex access qualifiers, which monitor the M•CORE virtual bus.

The idea is to set a watchpoint trigger where a signal as well as a message may be transmitted. The message tells which of the watchpoint triggers occurred. This is especially valuable for debugging variable writes. For example, if you have a global variable that is being modified by a number of processes and you want to pinpoint which of those processes is accessing that variable, the watchpoint message is the tool to use.

This feature also asserts an event out pin, which may trigger a logic analyzer to capture specific public messages and/or peripheral signals. For power analysis, the trigger may be used to capture current measurements at specific points in code or data accesses, which may be useful for pinpointing power consuming hot spots.


Monitoring Data Variables
Data Trace messages provide a means for reporting real-time data accesses to memory locations. Reporting data loads and stores has a much higher instance than reporting program flow changes. Analysis has shown that as much as 25% of instructions executed in a program are data accesses. Data Messages may be used to report stack contents, global and local variables as well as peripheral port accesses.

To control the number of Data Messages transmitted, data trace qualifiers include the access type, i.e., read/write or either, as well as a start and stop address range. If the data address and access type qualifiers are met, data messages are generated and sent to the debug port. This narrows the window of memory locations, which may incur a Data Message.

Only sending the unique portion of the data address instead of the complete address reduces output bandwidth requirements for the debug port. Consequently a data trace message is reconstructed relative to each prior message using a synchronization message as a reference address to begin with.


Real-Time Data Access Capability
The M•CORE M341's Nexus port provides access to the MLB mapped resources via the JTAG port. A Ready for Transfer pin (RDY) was added to increase the transfer rate. Calculations show that accesses to the Read/Write Data Register allow for a throughput of 1-Mbps on an M•CORE M341 microcontroller operating at 50-MHz system clock. Block transfers are possible with a single setup of Read/Write control and address registers. This permits 32-bit transfers in 38 JTAG clocks where each JTAG clock is one-half of the system clock.

This capability significantly reduces program and data load times as well as enables the developer to examine arrays of memory without stopping the application. Data Trace Messages only report data movement within a well-defined data window while Read/Write Access permits accessing values asynchronously. This feature is very useful for downloading new filter coefficients or encryption keys when testing communication protocols. Another important use is the retrieval of power data values, which may be built into the power management unit of the handset.


Real-Time Debug Tool Support
Two key ingredients to successfully reducing development time are on-chip circuits such as the Nexus 5001 port and the associated development tools, which support the Nexus interface. The Nexus 5001 specification defines the pins, connectors, and the protocol for transferring messages to and from the host computer.

However, it is very difficult to define stringent rules for Nexus register sizes, bit positions, and other implementation specific details, which may not suit particular semiconductor vendors' architectures. An application protocol interface (API) that abstracts implementation details is the ideal solution for tool vendors.

Figure 5: A lab debug environment that uses an integrated software tool set coupled to a logic analyzer and power source/monitor for complete handset development

The emulation controller provides the abstraction layer so that an API may be defined, which provides the details at the emulation controller to the Nexus interface without burdening the Tool Vender. An FPGA was added to the emulation controller, which would reconstruct the full message from the two Nexus port output pins to a 40-bit wide message with message trigger signal. This improves utilization of the logic analyzer's trace buffer.

Classical debuggers use a load, arm, go scenario where once the debugger has sent the target processor(s) to work, the debug environment is frozen until the target processor re-enters a debug or interrogation mode. To fully exploit the real-time debug capabilities of the M341 Nexus port, the debugger must permit interrogation of target resources as it executes code in real-time.

During initial development of the M341 Nexus port, a Hi-Ware (Hiwave) debugger was interfaced to a Tektronix TLA-714 logic analyzer. The Hiwave debugger directly interacts with the Tektronix logic analyzer to arm its trace buffer for message captures and later displays the trace buffer contents within the Hiwave environment.

Since the M341 processor has a different instruction set and programmer's model than the DSP56600 architecture, a dual integrated environment with split windows, one each for the respective processor is utilized for debugging the baseband transceiver. A single emulation controller is used that can communicate with each processor using the JTAG protocol. A semaphore configuration in the dual debugger's control module regulates traffic to the emulation controller so there are no message collisions when communicating with either processor.


Minimizing Debugging Penalties
The additional feature set of the Nexus port doesn't come without some die area and power penalty. Therefore, during its implementation all sub-module clocks were gated off for inactive circuitry and the message decode state machine and logic were enabled/disabled via Nexus control. Special consideration was given to the message queues to reduce power and the output port was made variable width to accommodate a 2- or 8-bit width.

This is quite important from a development perspective. During lab analysis the 8-bit port would be used since there was room to add larger connectors on the evaluation cards. But once the handset ergonomics had been finalized and the high-density double-sided surface mount board was used, it was decided that a reduced bandwidth over the output port could be feasible at that point in system development.

Overall the Nexus Class 3 implementation was 7.5% of the M341 processor area. But considering the size of the complete baseband transceiver, it becomes quite small relative to the addition of on-chip memories and the DSP.

For more information on the IEEE-ISTO Nexus 5001 Forum, read IEEE-ISTO Nexus 5001 Forum Targets Real-Time Debugging.

1

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS



COURSE
WEBINAR
TECH PAPER
TECH PAPER




 :