In System on Chip (SoC)architectures, the ability to effectively analyze problems and optimizeoperations using real time in-system instrumentation is recognized asone of the most effective methods for completing product development.
Perhaps nowhere is this need more prevalent than for multi-core architectures, sincethis is an areas where traditional processor simulation and systemanalysis methodologies tend to break down when faced with issues suchas concurrent programming and multiple processor flows, asynchronousoperations, core-to-core integration issues, etc.
Real (either prototype or production) hardware provides the bestsource of information for this type of analysis, but often suffers fromvisibility into functionality of the system, such as processorinstruction flow and data movement on embedded buses, which are nottypically made available at the pin IO. Increasing the visibility ofthese embedded operations typically requires an instrumentationsubsystem.
Debug solutions vary greatly with different IP vendors, which canmake integrating debug and instrumentation systems for heterogeneousmulti-core architectures difficult. In this is dynamic area, the mostmature standardized solutions addressing system instrumentation anddebug are those based on IEEE 5001, also widely referred to as Nexus.
Some background on Nexus 5001
The initialIEEE 5001 Nexusspecification was developed in 1999 is an IEEE standard for debuginterfaces of embedded systems and processors, and has since beenimplemented in numerous devices.
Unlike other debug solutions, that are typically proprietary to aprocessor or tools vender, Nexus is openly defined and consistentlysupported by a variety of tools vendor and across a range of processorcores. Nexus standards are commercially supported by the Nexus 5001Forum, which includes silicon suppliers, tools developers, IPcompanies, and end users. 
Nexus was developed to provide extended debug capability, addressingthe limitations of JTAG (IEEE 1149.1) when used fordata intensive analysis operations such as trace and calibration. WhileJTAG remains a widely used interface for on-chip debug, its serialarchitecture has inherent limitations in supporting the speed andbandwidth needed in analysis of modern devices.
Nexus is compatible with JTAG, and in basic run control modes, canbe implemented using only a JTAG serial port; but more significantly,Nexus supports the use of high bandwidth interfaces to efficientlytransport data between the silicon targets and debug tools. Widerbandwidth and higher speed data interfaces allow for simpler and morepowerful debug interfaces than JTAG based debug architectures.
This has significant advantages for multi-core systems, where largermounts of data can be needed for analysis.. Multi-core debug is asignificantly more complex and data intensive analysis problem thansingle core debug, since in addition to having to debug the applicationsoftware for each processor core, there are new considerations relatedto the communications between processors, multiple processors sharingof common resources (memory peripherals, etc.), synchronization forsystems where processors or buses may not be running the same speed,etc.
The basics of the Nexus 5001 debugspec
To understand how Nexus inherently provides several features that areimportant in this complex multi-core systems analysis context, it isuseful to provide a system level overview of the Nexus architecture.
A high level overview the Nexus architecture as in Figure 1 below shows the basicelements of Nexus architecture. Key to the Nexus communicationsconcept, information is transferred as packet messages. The targetsource or destination of this information is typically a set a Nexusregisters, some of which are fully defined and others of which are leftfor vendor specific applications.
Each message is essentially self-contained, and includes packetfields of information for or from tools about the onchip target (coreor subsystem) source or destination, type of information provided,register locations, timing relative to other information, etc. Thestructure of the debug transaction is defined through a TCODE header,which defines the type and fields of the message.
The Nexus specification (currently) defines 34 standardized TCODEs,which range from general-purpose register accesses to applicationspecific trace operations. Addition space is reserved for up to 30user/vender defined TCODE message types. At a physical layer, theinformation transfer between a silicon target and the outside world iseither via JTAG (as a vender defined “Nexus-Access” operation) orthrough parallel (AUX In and AUX Out) port interfaces, which providesignificantly higher bandwidth than JTAG.
One or both of the AUX ports are optional, relative to use of JTAG,so as an example, data intensive Nexus trace messages could be exportedover an AUX Out Port, while less data intensive Nexus input messages tothe target could use the JTAG port AUX and serial JTAG interfaces andcontrol are implemented via separate FSMs.
This also allows Aux In and Out ports to run autonomously (allowingin principal, concurrent import and export of messages to a target).The width of the AUX interfaces is configurable, and size can varyindependently for each port as a power of 2 ( i.e. 2, 4, 8, 16,).
|Figure1: Basic Nexus Architecture|
Classifying the Nexus TCODES
Nexus TCODES can be classified into 5 different types, which aredescribed in detail in the Nexus specification . These include:
1) Status ” indicate status information messages from target. This group includesregister reads and core specific or watchpoint/breakpoint status, errormessages, etc. (TCODES 0-2, 8, 15)
2) Generalregister read/write ” a group of commands that allow memorymapped reads and writes between tools and Nexus Recommended Registers(NRR) or other registers in a Nexus defined memory map. Among othergeneral applications, these messages can be used for run control andconfiguring watchpoint/breakpoint operations. (TCODES 16-19)
3) Program Trace ” a range of trace options that rely on Nexus defined branch traceschemas, which limit instruction trace to discontinuities (branches,conditional jumps, interrupts, etc.) and their relative distance fromlast trace. By mapping these values to an assembled program, debuggerscan interpolate branch locations in the program flow and reconstruct(inter-branch) instruction flow. Nexus also defines periodic synchfields and trace messages to identify inconsistencies and align trace,which is useful in correlating execution over multiple cores (TCODES 3,4, 9-12, 27-33)
4) Data Trace ” trace of data values is associated with a defined address range forefficiency. Nexus also supports data acquisition instructions forstreaming export of larger amounts of system information; such as datafrom on-chip buffers or FIFOs (TCODES 5, 6, 7, 13, 14)
5) Memory Access ” non-intrusive peek and poke operations of internal memory blocks, canalso used for directly driving from a Nexus memory or location. (TCODES22-26)
6) PortReplacement ” allows Nexus pins to emulate other I/O functionsof comparable speed. (TCODES 20, 21) All Nexus TCODES follow a commonmessage format :
where packet information is either stored or accessed from Nexusdefined registers.
The Nexus specification defines and assigned register maps to 63recommended registers, which are accessed by TCODE operations.
Different instances of the same register can be associated withdifferent cores by a source field value that can be transmitted as partof each output message. NRRs may contain recommended fields, specifyingcontrol or status information, and include the following:
1) Deviceidentification register (DID ) ” IDs for discrimination andselection of different subsystems (at the SoC level) or at the chips(for multi-chip debug scenarios)
2) ClientSelect Register (CSC) ” which contains originating source ofinformation for trace and other exported messages.
3) ControlRegister (DC) ” which contain debug parameter and configurationinformation
4) StatusRegister (DS) ” which contains debug status information
5) User BaseAddress Register (UBA) ” which defines the base address forrelative or truncated addressing modes.
6) WatchpointTrigger Registers (WT) ” which provide watch or breakpointstatus
7) Data TraceAttribute Registers ” which contain information on recent traceoperations and program information needed to reconstruct the trace
8)Breakpoint/Watchpoint Control Registers ” which contain watchand breakpoint configuration information
9)Breakpoint/Watchpoint Address/Data Registers – which defineaddress and/or data for assigning watch and breakpoint locations
Nexus in Multi-core Debug
One of the differentiating factors in multi-core debug is the need fornew types of instrumentation analysis and control. While singleprocessor systems can be largely analyzed by their instruction,multi-core analysis typically also requires information oninter-processor communication and multicore control.
Processor specific debug features in Nexus are discussed in severalapplication notes for different processors, however Nexus can also beused to provide debug interfaces for non-processor specificapplications such as bus analysis and multi-core cross-triggering,using approaches such a generic register reading and writing class ofTCODES.
|Figure2 : Nexus Multi-core Example ” AUX and JTAG Ports|
The power of the Nexus architecture for multi-core applications (Figure 2, above ) isthat it allows a variety of implementations of debug architectures overa standardized interface protocol that is currently supported byleading debug tools venders.
While some processors may have beendesigned with Nexus enabled debug in mind, most processors have someport, where debug information can be reformatted and wrapped into aNexus messaging interface.
Since the transfer of information is message based, a variety ofscheduling and transfer methods of simply parsed and disassembledmessages between the Nexus interfaces and different cores aresupported, that allow for delayed and prioritized transfer ofinformation between several cores and the Nexus interfaces.
Since their characteristics differ, lets consider the cases of AUXIn (from tool to target) and AUX Out (from target to tool) messagesseparately.
Input -Tool to target – messages
Managing Nexus input messages in a multi-core system isstraightforward, since there is typically a single host generatingmessages over the debug interface and only one message will be queuedup for transfer to the on-chip target at any given time.
The number of TCODE operations for Input operations are limited toregister and memory access types and port replacement definition. Eachinput message contains fields with either a register opcode defined viathe Nexus register map or a memory address for memory operations.
Output -Target to tool – messages
Output messages from the target to the tool are potentially complex tomanage, since (trace) operations, especially if occurring in bursts,may be more data intensive than input operations and can potentiallyexceed the Nexus AUX port bandwidth (Figure3, below ).
This problem is compounded for the multi-core case where differentcores, each with their own trace messages to export; which arecompeting for access to the Nexus interface. Nexus trace messages caninclude synchronization and timestamp fields that simplify thereconstruction of trace information that may be delayed in being sentto tools for a given target.
If trace may be delayed prior to export, one of the design factorsin the Nexus blocks should be a level of buffering sufficient to avoiddropping or loss of messages while waiting on access to the Aux Outport.
There are a variety of ways to manage output data from multiplesources. A simple approach is just to configure a simple static outputmultiplexor to choose between different Nexus message streams anddisable Nexus traffic for the duration of the subsystem not chosen.
If this duration is significant, or is competing with other dataintensive messages (Memory access as an example), this can result inthe need for larger on-chip buffers to avoid losing trace messages.
|Figure3 : Multiplexor Based Aux Out Port|
The nature of multi-core systems analysis however is that for manyproblems, debug requires access to concurrent information from severalcores in order to sufficiently understand the issues involved. To avoidthe need for large on-chip buffers, more sophisticated message controlcan be implemented to provide scheduling, prioritization, andarbitration of Nexus Messages.
Nexus messages can be merged at a Nexus port control level, to allowpackets from many debug sources to share a common Nexus port. Sinceeach debug block can be assigned an independent identification (DID)value, debug information can be redirected once off chip at the probeinterface or as a software operation.
The packet nature of Nexus messages allows a variety of networkqueuing techniques to interleave Messages from multiple sources into acommon Aux Out port. The intelligence for this may be implemented inon-chip controller hardware (Figure 4below) or may be implemented in off chip software withpriorities transferred to a simpler AUX Out control block as Nexusinput messages.
|Figure4 “Nexus Multi-core Message AUX Out Processing|
Implementing Nexus 5001 systems in hardware is well documented andmanageable for virtually any SoC. Additional issues that also should beaddressed on a case by case basis for every design include theirsupport of different software vendors for Nexus, and in Nexus debug ofMulti-core Systems Nexus.
The issues of on-chip debug are dynamically changing and evolving tomeet the increasing needs of SoC products. The Nexus 5001 Forum is anopen industry organization and welcomes participation from bothindustry and research groups interested in advancing SoC debugcapabilities.
The Multicore Association hasongoing collaboration with industry debug related efforts, includingOCP-IP and is in process of extension of the IEEE 5001specification for a 2008 specification release to support emergingdebug interfaces such as Gigabit SERDES and 2-wire JTAG (1149.7) ports thataddress diverse requirements of different applications.
 IEEE 5001 Nexus specification
Neal Stollon is Principal Engineer at HDLDynamics, an instrumentationconsulting and IP development group, focused on SoC on-chipinstrumentation and analysis solutions. He has over 25 years technicaland business experience in digital design, processor architectures, andtools development at MIPS Technologies, Texas Instruments, LSI Logic,Alcatel, and others. Dr. Stollon earned a Ph.D in EE from SouthernMethodist University and is a Texas Professional Engineer. Neal iscurrently the Vice-Chair for the IEEE-ISTO Nexus 5001 Consortium andserves on a variety of Design Automation committees.