Putting Multicore Processing in Context: Part 3 - Embedded.com

Putting Multicore Processing in Context: Part 3

Having attended many interesting presentations at the First AnnualMulticore Expo in Santa Clara, Ca., recently, it is apparent that thetrend toward multicore deployment in the embedded space is goingstrong. The case can be made that multicore trends in the embeddedspace will continue to grow.

The presentation topics ranged from software development formulticore to introductions of new multicore processors to tools andwhere the industry is heading. Heady stuff. However, one of the mostobvious things was the concern for the current state of tools formulticore development.

It’s clear that hardware guys have it good relative to multicoresoftware designers, something that may not have been apparent 20 yearsago. With the advent of multicore, if hardware designers are taskedwith doubling the raw performance of a processor, there is a very realchance that they will take existing IP, tweak it, replicate it andprovide some interconnect logic and call it a day. If they need to makeit four times as fast, they do the above three times.

Not meaning to trivialize hardware design, the point being made isthat one of the reasons for going multicore is the ability to drivesimpler, slower frequency, lower power cores. The increase inprocessing power comes from the number of cores, not their frequency oruse of super scalar pipelines, etc. to make them faster. To take a nodfrom Adam Smith, author of “Wealth of Nations,” division of labor andspecialization of labor is the key to multicore processing.

A recent study suggested that engineers are only capable ofacquiring two to three skills at a time. If that is true, then embeddedmulticore designers need to make two of those skills the ability tounderstand concurrency (potentially massive concurrency) and theability to debug their software in a multicore environment. Returningto earlier comments, chip designers as well as software developers,need better tools in order to design and debug real world, real-time,embedded applications.

One manufacturer presenting at the expo has successfully deployedtheir 300+ core processor to the embedded market. A software developerthat wants to use this core has the option of using the coremanufacturer’s compiler and debugger, or none at all if they want touse that core.

Unfortunately, this seems to be the current state of the industry –everyone has their own way of doing things. There is no standardapproach to capturing concurrency in a design. There is no standard fordebugging a multicore target, or even a standard for connecting to amulticore target.

Mentor Graphics is working with hardware, software, and firmwarevendors, including participants of the Multicore Association, toestablish industry-wide standards to provide an easy way to mix andmatch operating systems, have them share resources, and communicatebetween each other. The company is working to establish debugging andconnection standards as well as inter-core communication mechanisms.

Creating multicore-aware debuggers
Debugging embedded targets has come a long way in the last 20 years. Nolonger do embedded developers have to rely on “ printf debugging,”which really doesn’t belong in the embedded world anyway (first, youhave to decide where a printf is going to be directed at, and it canonly work after the hardware and drivers have been debugged).

Today, developers enjoy robust debugging suites that usehardware-assisted connections (i.e., JTAG) to download applications toand control the target. Good commercial packages not only let thedeveloper start and stop the processor, but provide intuitive ways tomonitor registers, memory, and stacks. One of the hardest parts ofembedded development is to understand the behavior of the system, anddebuggers are the tools that allow the visibility into the innerworkings of an application.

However, debugging a multicore target throws a whole new wrench intothe works. How does one control each core with one debugger connectingto all the cores (or with a separate debugger for each core?) How isthe data for multiple cores best organized to make sense to a developer(and does that change between a two-core system and a 300-core system?)How many cores can an engineer observe at one time and still understandwhat’s going on in each?

These are the questions that the industry needs to answer before thepower of multicore designs can really start to meet its potential.

Is JTAG the answer to multicoredebug?
One of the main differences between desktop and embedded debugging isthat embedded targets are external to the desktop system and have to beconnected to the debug console or integrated development environment(IDE) in some fashion. Strangely enough, this device is called a“connection device” or “connection” for short.

Connections range anywhere from two-wire connections to complex anddefinitely more expensive Joint Test Action Group (JTAG) devices, whichcontain huge amounts of random access memory (RAM), or even hard driveswhich they utilize to queue up data. The host uses the connection tocommunicate with the target device. For simple two-wire connections,the interaction between the host-based IDE and the target are limited.

JTAG-based connection devices allow “on-chip debugging.” They allowthe IDE to interact with the target and provide services such asremotely start, stop or suspend program execution (set a breakpoint)and allow one to view memory and register contents as well as IO andperipheral devices. The IDE utilizes a sequence of these functions sothat one can establish breakpoints or step through code.

So what makes the JTAG so special? Back in the old days, printedcircuits boards were tested on what is called a “bed of nails.”Basically, when the board was created, it also had test points (solderpads) placed on strategic places on the bottom of the board. After aboard was populated with chips, one of the final manufacturing stepswas to put the board on the bed of nails to be tested. The bed of nailshas spikes that stick up to make contact with the test points.

However, as technology evolved, and more and more of the boardfunctionality was moved into microprocessors and ASICS, theaccessibility of test points became a problem. In the mid to late1980s, several companies banded together to form the JTAG. The resultsof the JTAG were accepted by the IEEE in 1990 and the IEEE 1149.1standard known as the Standard Test Access Port and Boundary ScanArchitecture was born. The name JTAG (Pronounced “J” “Tag”) was keptsince it is easier to say than “STAPBSA.” The boundary scan methodenables in-circuit testing and eliminates the need for the bed of nailstesting.

Making the right JTAG connections
The use of the industry standard JTAG scanning interface, initiallydeveloped for boundary scan testing of complex devices and boards overa low pin-count interface, has also become a standard method foraccessing and debugging processor cores. This is because it requires asmall number of pins, and has already been widely adopted for itsoriginal purpose.

Using JTAG for processor debugging required adding a debug serviceunit, or “debug logic” into the CPU core design and adding anadditional JTAG scan path to access that logic. A brief overview of acommon JTAG TAP (Test Access Port) with its multiple JTAG scan paths isshown in Figure 1 below .

Separate scan register paths are provided for boundary scan, readingthe device ID code, initiating built-in, self-test functions andobtaining their results, and accessing the debug support unit. The TAPInstruction Register (TAPIR) is used to select the desired path, orduring normal operation, the TAP is left in the Bypass state so theother functions are disabled.

Figure1. A single JTAG Test Access Port

Multi-core (multi-TAP) Configuration
The most cost-effective configuration (lowest pin count) for multicoredevices is to string the JTAG TAPs within each core along a singledaisy chain as shown in Figure 2,below.

In this way, the instruction registers for each TAP are concatenatedinto one long instruction register. So a specific core at a knownposition in the scan chain can be set to select the debug support unitregisters and all other cores can be set in bypass mode, therebyallowing one core to be individually addressed by one debugger controlpacket.

Figure2. Multiple cores on a single JTAG scan chain

Extending this concept, multiple debuggers can each be assigned toindividual cores and can send debug service control packets to theirassigned core without impacting (or creating awareness of) the othercores (ignoring shared memory considerations for the moment), sinceEthernet debug service packets are queued and executed in the order ofarrival.

Synchronous Stopping and JTAG Skid
Individual commands issued to a CPU core over JTAG require hundreds ofJTAG operations. While these appear to execute very quickly (the JTAGscan chain may typically be doing serial scans at 10 MHz to 40 MHZ), atleast to the human viewer, this is actually a very slow process incomparison to a CPU core running at say, 400 MHz to 1.2 GHz.

Since JTAG debug operations and processors running at hundreds ofMHz are inherently asynchronous functions, without hardware support onthe chip, it is not possible to stop one processor at a breakpoint, andhave that event cause another core to stop precisely at that locationusing only JTAG operations. The time lapse between issuing a JTAGcommand and the processor responding thousands of CPU cycles later iscommonly known as “skid.”

What this looks like from a debug experience standpoint is that youare debugging the cores completely independently; there is no realinteraction between them. So connecting to multiple coressimultaneously really doesn't mean much, because even when you do that,you still have the situation that you cannot do anything to both coresat the same time. This is a limitation of JTAG, and also of the factthat there is no formalized hardware interconnect standard formulticore debugging.

To address this problem, built into the core of the Mentor GraphicsEDGE debugger is the ability to have “synchronization groups.” That is,designers can define a group of threads that are to be stopped when agiven thread hits a breakpoint. This is backed up by a capability thatthe back end transport provides to the debug engine that says “I canstop this set of cores synchronously.”

If this capability is not there, then the debug engine does its bestto emulate the capability by turning around and stopping the othercores when the one hits the breakpoint. Obviously, there will bethousands of instructions of skid, but without hardware standards, thisis better than nothing.

Can Nexus extend to multicoredebug?
As mentioned earlier, JTAG is a communication mechanism used to controlan embedded processor. It does not directly have anything to do withdebugging. On the cores themselves there must be debug logic thatcontrols the core.

The “Nexus 5001 Forum” is an industry group that has advanced a newIEEE standard (IEEE-ISTO 5001) that defines just such a debug logicblock to support embedded development. It does contain some compellingfeatures such as the ability to read/write memory on the core while thecore continues to run.

While this is cool, it doesn’t directly have anything to do withmulticore debugging, except for the fact that it does define ahigh-speed auxiliary communications mechanism that can be shared bymultiple cores for transmission of real-time trace data, among otherthings. Unfortunately, the adoption of Nexus has been very slow, and itdoes not have nearly the installed base that other technologies have.Also, it does not appear to have much traction outside of theautomotive industry. Perhaps it will gain momentum in the future withthe growth of multicore.

What It All Means
From the silicon vendor’s perspective, it is pretty clear what thevendors would get out of having industry-wide standards for connectingto and debugging an embedded target. Silicon vendors spend largeamounts of time and money trying to create an “ecosystem” that isbeneficial to their product.

As a result, they spend enormous amounts of time putting RTOS, tooland connection support together so that developers can use theirproduct when it hits the street. They may have to pay tool vendors thatare reluctant to support their proprietary hardware non-recurringengineering (NRE) to do the work to support them. That time and moneywould be better utilized plowed back into either their shareholders’wallets or into research and development.

The ones that appear less likely to benefit, aside from thedeveloper, is the tool and connection vendor. Why would they be likelyto benefit from having all of their competitors considered for everytarget? One reason is that successful tool vendors have distinctivecompetencies that their customers value.

Also, the tool vendor knows that their profits would increase ifthey had a wider audience of targets that their tools could be used on.Furthermore, they spend huge amounts of time and money “porting” theirproduct to different silicon platforms. That time and money would bebetter utilized by focusing on the value that they can bring to the endcustomer rather than chasing a moving target.

From the developer’s perspective, what does all this talk aboutstandards mean to them? To start, it means freedom of choice. It meansthat they can choose from a plethora of different priced, differentfeatured tools and connections. It means that their tools can be usedacross targets. It means that one connection device will connect toARM-based multicore products as well as to MIPS, MicroBlaze andIntel-based multicore targets.

It means eliminating the requirement to purchase new tools becausethe debugger being used does not work on the new target. It means thatthe developer can spend time gaining field expertise rather thanlearning how to use a new tool.

The role of Eclipse in multicoredebug
The current status of debuggers and connections for multicoredevelopers is respectively good and bad. The Eclipse Foundation andvarious sub-projects are making headway into the embedded space.Eclipse provides a “debug platform” which debugger vendors canimplement to debug any arbitrary system. The result is a common lookand feel regardless of whether a designer is debugging Java, a Perlscript, or an embedded C/C++ application.

From the ground up, Eclipse was designed to be able to debugmultiple applications simultaneously, and has a number of features thathelp facilitate this. In Eclipse, all views in a frame typicallyreflect the currently selected context.

So if a designer has a thread in application “Foo” selected as thecurrent context, the variables view, expressions view, and registersviews all update to reflect “Foo.” If the designer then selects athread in application “Bar,” these windows update to reflect “Bar.”Combine this with the fact that the designer can open multiple frameinstances and have the beginning of a nice multicore developmentenvironment (a good reason to request a nice dual monitor system).

Eclipse has other nice features for multi-context debugging as well.“Working Sets” of breakpoints for example (e.g., set the breakpoint infile theDriver.c in Foo, but not in Bar). The DSDPProject (DeviceSoftware Development Platform) for example is driving the creationof aflexible debug hierarchy, which will be a better fit for supportingdebugging in a typical embedded multicore scenario: connection device-> core(s) -> process(es) -> thread(s) for example.

In addition, the DSDP project is creating a common infrastructurefor connecting to remote targets, and then using services on them(e.g., debugging, profiling, exploring target file systems, opening ashell).

More and more tool vendors are migrating their offerings to Eclipse,creating a very interesting new ecosystem. The result for the toolsusers is that it will be possible for them to increasingly focus onbuilding an efficient development process on top of the tools, insteadof spending so much time and energy on the tools themselves.

The status of connection and debug hardware on the board is not aspositive at the moment as is the state of debugger development. Theupside to all this is that the Multicore Association is working towardaddressing this exact deficiency in connection and standards forhardware.

Conclusion
The Multicore Associationis in its infancy. It is recommend that allinterested parties including software vendors, hardware vendors anddevelopers, invest some time, energy and money in it as it is thesingular entity out there trying to bring together all theplayers on the multicore scene.

Hopefully, the Multicore Association and its debug working groupswill get some traction and put some stuff out there quickly to helpgain a following not to be ignored.

To read Part 1 in this series, go to AdamSmith’s answer to multicore design.

To read Part 2 in this series, go to Dealingwith hardware and OS issues.

Todd Brian is product marketingmanager for Nucleus kernels products, Lyle Pittroff is productmarketing manager for EDGE Connections products, Aaron Spear is DebugTools architect, and Jeff Womble is product marketing manager for EDGETools products at Mentor Graphics.

For more information about multicore and multiprocessorarchitectures, tools and methodologies, go to MoreAbout Multicore andMultiprocessing.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.