Using open-source GNU, Eclipse & Linux to develop multicore Cell apps: Part 4 -

Using open-source GNU, Eclipse & Linux to develop multicore Cell apps: Part 4

Cell software development on the IBM Full-System Simulator described inPart 3 inthis series generally takes six steps, some of which are optional:

1. Configure simulation parameters (PPU mode, SPU mode) and identifydata collection procedures (triggers, emitters) using the commandwindow.

2. Build a Cell executable on the host system using the SDK'scross-development tools.

3. Transfer the executable and support files from the host system tothe simulated Cell using the callthru source command in the consolewindow.

4. Run files on the simulated Cell by entering commands in theconsole window.

5. Transfer output files from the simulated Cell to the host systemusing the callthru sink command in the console window.

6. Use the command window to access data generated during thesimulation.

Data collection methods, including triggers and emitters, will bepresented shortly. For now, let's simulate a simple application: theSieve of Eratosthenes executable from the previous section.

Building the Example Application
If the host processor isn't a 64-bit PowerPC, the Cell SDK installerinstalls packages for cross-development.The tools in these packages runon the host processor but compile code for the Cell. They have the samenames (ppu-gcc, spu-gcc) as those described in Part 2 in thisseries, but the compiled executables can't run directly on the hostprocessor.

(Note: At the time of this writing, the directorycontaining the cross-development tools isn't added to the PATHenvironment variable. To ensure that the makefiles work properly, add/opt/cell/toolchain/bin to the PATH variable. )

The code for this example can be found in the Chapter4/sievedirectory .To build the application, change to this directoryand enter make.This invokes the build commands in Makefile, which usethe host's cross-development tools to generate the spu_sieveexecutable.

Transferring the Application to the Simulated Cell
The host processor can't run spu_sieve because it was compiled for theCell.To transfer the executable to the simulated Cell, you need thecallthru command. This command, entered in the console window,transfers files between the host and the simulated device. It can beused in two ways:

1) callthru source$(HOST_DIR)/file_name > $(CELL_DIR)/file_name
2) callthru sink$(HOST_DIR)/file_name < $(CELL_DIR)/file_name

where HOST_DIR is a directory on the host system and CELL_DIR is adirectory on the simulated Cell.The first command, callthru source,transfers file_name from the host to the simulated device and thesecond, callthru sink, transfers file_name from the simulated device tothe host.

To reduce the size of the pathname on the host, it's common to movefiles to the host's /tmp directory before transferring them to theCell. Change to the Chapter4/sieve directory and enter

cpspu_sieve /tmp

Then transfer the executable to the simulated device by entering thefollowing command in the console window:

callthrusource /tmp/spu_sieve > spu_sieve

The spu_sieve file will be placed in the current directory on thesimulated Cell. It's a good idea to run ls to make sure the transfercompleted properly.

Running the Application
If you attempt to run the executable immediately, you'll receive apermission error. Make the file executable with

chmod+x spu_sieve

and run it by calling ./spu_sieve in the console window.The consolewill present a list of the prime numbers less than 250. Optionally, youcan transfer spu_sieve back to /tmp on the host by entering thefollowing:

callthrusink /tmp/spu_sieve < spu_sieve

callthru is a vital command in the console window, so it's importantto have a clear understanding of how it's used.

SPU Statistics and Checkpoints
The Full-System Simulator can do much more than just run Cellapplications.To appreciate SystemSim, you need to see the performancedata it collects and the different ways you can control how and whenthe data is collected.This control is an important benefit of using thesimulator, and there are two simple ways to take advantage of it:

1. Enter Tcl profilingcommands in the command window.

2. Insert profiling commandsinto SPU code.These commands are called checkpoints.

The first method collects performance data throughout the executionof an application. The second method lets you control when the datacollection begins and ends.

SPU Statistics and Profiling Commands
SPU performance data, collectively called statistics, include thenumber of instructions executed, the number of cycles run, and thenumber of pipeline stalls that occurred during the simulation.Appreciating these measurements requires a thorough understanding ofthe Cell's architecture, so this discussion just walks through a briefexample.

Like the PPU, each SPU has a simulation mode that determines howprecisely System-Sim performs its analysis. By default, all SPUs areset to Instruction mode, which means SystemSim doesn't acquireperformance data.

To gather statistics, the SPUs need to be in Pipeline mode. Clickthe SPU Mode button in the graphical panel and select Pipeline for eachSPU. Figure 4.5 shows what the SPU mode selection dialog lookslike.

Figure4.5 The SPU mode-selection dialog

If the executable isn't there already, use callthru to transferspu_sieve to the simulated Cell. Then complete the following steps:

1. In the graphical panel,click Stop to pause the simulation.

2. In the command window,enter mysim spu 7 stats reset.This clears the simulator's storedstatistics for SPU 7.

3 . In the graphical panel,click Go to start the simulation once more.

4. In the console window,run ./spu_sieve.

5. In the command window,enter mysim spu 7 stats print.This displays the statistics generatedduring the execution of spu_sieve.

Note: This discussion assumes that the operating systemchooses SPU 7 to run executables. If this isn't the case, you'll needto reenter the commands for the chosen SPU.

Instead of using mysim spu 7 stats print, you can access SPUstatistics by opening the SPE7 folder on the left side of the graphicalpanel and double-clicking SPUStats. Figure 4.6 below shows whatthe statistics window looks like.

Figure4.6. SystemSim statistics window

You can also view the contents of SPU memory and the operation ofthe SPU's Memory Flow Controller by selecting the corresponding options(SPUMemory and MFC) in the SPE7 directory.

SystemSim makes SPU statistics available to Tcl scripts as arraystructures.The command needed to access them is

arrayset myArray [mysim spu n stats export]

where myArray is the name of the array to store the data and n isthe SPU being examined. This command is commonly used by SystemSimtriggers, which are discussed shortly.

You may only want performance data for a specific portion of anapplication. In this case, running commands in the command window won'tbe sufficient:You need to modify the SPU code in a way that tells thesimulator when to start and stop collecting measurements.

SPU Checkpoints
Checkpoints are statements in C/C++ code that control when SystemSimcollects data. Each checkpoint function is named prof_cpN(), where N isa value between 0 and 31. These functions are treated as no-ops duringexecution, but the simulator recognizes them for control purposes. Ofthe 32 checkpoints, 3 have specific purposes:

prof_cp0() :Resets the SPU statistics for the SPU
prof_cp30() :Enables collection of statistics for the SPU
prof_cp31() :Ends collection of statistics for the SPU

These functions are declared in the header file profile.h, locatedin


The code in available in the Chapter4/sieve_check directoryavailable in the samplefile downloadable from online implements the same Sieve ofEratosthenes algorithm as in Chapter4/sieve, but inserts checkpointsthat reset, start, and stop the simulator's data collection. Listing4.2 below shows the full code, with the new checkpoint statements inboldface.

Listing4.2 Inserting SystemSim Checkpoints: spu_sieve_check.c

These checkpoints tell the simulator to collect statistics ascomposite numbers are set to 1, but not before or after.To simulatethis application, copy the spu_sieve_check executable to /tmp and enterthe following commands in the console window:

callthrusource /tmp/spu_sieve_check > spu_sieve_check
chmod +x spu_sieve_check

To see the statistics generated by the profiled section of code,open the SPE7 directory in the graphical panel and select SPUStats.Alternatively, you can enter the following in the command window:

mysimspu 7 stats print

This displays the same type of metrics as before, but the datarepresents only the processing that took place between the checkpoints.

Trigger Events and Trigger Actions
Event handling is an important aspect of simulation. If a stackoverflow occurs, you need a way to halt the simulation and find outwhat happened. SystemSim provides triggers for this purpose, and theprocess of using them consists of two parts:

1. Identify the conditionsSystemSim should monitor.
2. Tell the simulator whatactions it should take when those conditions occur.

The conditions of interest are called trigger events, and theactions are called trigger actions.

Trigger Events
SystemSim responds to many different types of trigger events, eachrepresented by a string constant. Table 4.5 below lists anumber of them, but not all. For the full list of trigger events, enter

mysimtrigger display assoc avail

in the SystemSim command window.

Table4.5 SystemSim Trigger Events

The first six events are straightforward. SIM_START/SIM_STOP respondto the simulator's operation, SPU_START/SPU_STOP respond to the SPU'soperation, and SPE_DMA_START/SPE_DMA_STOP respond to DMA operations.

The seventh event, SPU_PROF, is particularly important because itresponds to checkpoints in SPU code.That is, an SPU_PROF event occurswhenever the simulated device calls one of the prof_cpN functions.

By creating triggers for SPU_PROF events, you can send commands toSystemSim whenever a particular line of code is reached. For example,the trigger may halt the simulator at a checkpoint. In this case, thecheckpoint acts like a regular breakpoint for debugging.

Trigger Actions
Trigger actions tell SystemSim what to do when a trigger eventoccurs.They are implemented as Tcl procedures, described in AppendixE.When SystemSim invokes a trigger action, it passes information aboutthe simulation state into the procedure.

This information takes the form of a Tcl list containing a series ofname/value pairs. For example, if the triggering event is SPU_PROF, thelist contains six elements:

spuspu_num cycle cycle_num rt trig_num

where spu_num is the number of the simulated SPU, cycle_num is thecurrent cycle count, and trig_num is the number of the checkpoint thatcaused the event. That is, trig_num contains N in prof_cpN.

(Note: The list elements provided for SPU_PROF and otherevents may change in future releases of SystemSim. )

These values can be accessed easily by converting the input list toa Tcl array. This is done with the array set command. For example, ifthe list values are placed in an array called args and the triggeringevent is SPU_PROF, the spu_num and trig_num elements can be accessedusing $args(spu) or $args(rt).

Besides simulation information, you can send data to a triggeraction from the command window. If a Tcl variable is defined on thecommand line, trigger actions will be able to access it as a globalvariable. For example, if a command defines stat_count with

setstat_count 0

a trigger action will be able to access the variable with thedeclaration


If the trigger action changes the value of stat_count, the updatedresult be available from the command line.

A common purpose of trigger actions is to halt the simulated Cell sothat the user can examine its state.This is done with the simstopcommand.Another purpose is to collect performance data, and this isperformed with the command

arrayset myArray [mysim spu n stats export]

where myArray is the Tcl array that will store the SPU metrics, andn is the number of the SPU being examined.

Associating Trigger Actions with Trigger Events
For each trigger action you create, you need to load the Tcl scriptinto memory and associate the action with an event.There are two stepsinvolved:

1. Run source trig_script,where trig_script is the path to the trigger action script.This tellsSystemSim to read your trigger script into memory.

2. Execute mysim trigger setassoc event_name trig_proc in the command window, where event_name isan event constant, and trig_proc is a procedure in trig_script.Thiscommand associates the trigger with the event.

For example, suppose you created a trigger script calledstart_script.tcl and placed it in the host's /tmp directory.This scriptcontains a trigger action called start_proc and you want this procedureto be called whenever an SPU starts.The first step is to source thescript with


Then, to associate this trigger action with the SPU_START event,execute the following in the command window:

mysimtrigger set assoc “SPU_START” start_proc

You can verify the association by clicking Triggers/Breakpoints inthe graphical panel and looking at the available triggers.As thesimulation runs, SystemSim will monitor the start of an SPU's operationand execute spu_proc when the event takes place.

Listing 4.3 below presents an example trigger action thatresponds to SPU_PROF trigger events.When this event occurs, the triggeraction checks the number (0, 30, or 31) that corresponds to theprof_cp0, prof_cp30, and prof_cp31 functions. If the number equals 31,the action reads the simulator's SPU statistics and prints thestatistics corresponding to the pipeline stall cycles and branchinstructions. The trigger's output is sent to the command window.

Listing4.3 SPU Profile Trigger: prof_trigger.tcl

The array set command creates an array called arg_array from theincoming list of arguments. If the trigger number, accessed with$arg_array(rt), equals 31, the trigger places the SPU statistics in anarray called stat_array.

Then it displays two statistics: the number of pipeline-dependentstall cycles and the number of branch instructions. The first statisticis identified by the pipe_dep_stall_cycles index, and the second isidentified by BR_inst_count.

If prof_trigger.tcl is placed in trig_dir, the following commands(in the command window) will load the script into SystemSim andassociate it with the SPU_PROF event:

mysim trigger set assoc “SPU_PROF”print_prof

Assuming spu_sieve_check is in the host's /tmp directory, thefollowing commands (in the console window) will run the executable:

callthrusource /tmp/spu_sieve_check > spu_sieve_check
chmod +x spu_sieve_check./spu_sieve_check

As the application processes the checkpoints, SystemSim will invokethe print_prof procedure.When the prof_cp31() checkpoint is reached,print_prof will display the number of pipeline-dependent stall cyclesand the number of branch instructions executed between prof_cp30() andprof_cp31().

SystemSim Emitters
SystemSim can be configured to provide (or emit) processor informationwhen specific events occur.The simulator provides this information inthe form of data structures called emitter records, and they can beread by C/C++ executables called emitter readers. This operation isquite different from triggers, in which SystemSim sends processing datato Tcl procedures in the form of Tcl lists. But like triggers,configuring emitters is a two-step process:

1. Create Tcl commands thattell SystemSim to produce emitter records when simulation events occur.

2. Create a C/C++application to read and process the records produced by SystemSim.

You can obtain more information about the processor with emittersthan you can with triggers, but it's more complex to code the eventhandling procedures.Also, the Tcl code must be processed as SystemSimstarts up instead of being loaded into memory during regular operation.

Emitter Event Configuration
Emitters enable you to monitor many more event types than triggers,including memory accesses, instruction prefetches, segment tablelookups, and hits and misses for the L1 cache (both the instruction anddata cache).

Table 4.6 below presents a subset of these events, and youcan access the full list by running simemit list on the SystemSimcommand line. Many emitter events begin with Apu, which is an olderterm for the SPU. In addition to those listed, other events respond topipeline activity, threads, interrupts, and kernel operation.Theimportance of the tag values in the second column will be explainedshortly.

Table4.6 Emitter Event Types

To configure SystemSim to respond to emitter events, you need toperform three steps:

1. Identify events ofinterest with simemit set.
2. Identify the number ofemitter readers with ereader expect.
3. Identify the name of eachemitter reader with ereader start.

These Tcl commands must be processed as SystemSim starts. If you runthem afterward, SystemSim won't produce emitter records for the events.

The simemit set command tells SystemSim which events to monitor. Forexample, to tell the simulator to produce an emitter record related tothe Apu_Task_Start event, you'd use the following: simemit set”Apu_Task_Start” 1

The 1 value tells SystemSim to produce a record when the SPU startsexecution.To verify that this was configured properly, enter simemitlist and check the value of {Apu_Task_Start}.

In addition to identifying processor events, you need to tellSystemSim to provide information related to the Header_Record andFooter_Record events.This enables an emitter reader to recognize thestart and end of emitter data.The commands

simemitset “Header_Record” 1
simemit set “Footer_Record” 1

must be executed when configuring emitter events.

Just as simemit identifies events of interest, ereader tellsSystemSim about the application or applications that will be waitingfor emitter records. ereader expect specifies the number ofapplications that will be collecting information. If there are twoemitter readers, the command is as follows:

ereaderexpect 2

The command ereader start identifies the emitter readers that willreceive event data. This command requires a number of arguments: thefull path of the C/C++ executable, the simulator process ID, and anyadditional arguments for the executable.

The process ID can be obtained by accessing the global pid variable.For example, the following command tells SystemSim to start the emitterreader read_data with the global process ID:

ereaderstart /home/mscarpino/ereaders/read_data [pid]

Listing 4.4 below tells SystemSim to monitor the Apu_Mem_Readevent and start the eread executable to process event records. Moreevents can be monitored by adding more simemit set commands.

Listing4.4 Emitter Configuration: emit_conf.tcl

The commands in Listing 4.4 must be executed as SystemSim startsup.A simple way to make sure this happens is to append the content ofemit_conf.tcl to the end of systemsim. tcl, which is located in the$SYSTEMSIM_TOP/lib/cell directory.Then, when SystemSim starts, youshould see Emitter reader started! in the command window.

These commands tell SystemSim that the eread executable will readthe emitter records produced by the simulator.Therefore, the nextcrucial step is to code and compile the eread executable.

Coding Emitter Readers Pt 1: EMIT_DATA
Tcl scripts configure event handling, but the applications that receiveevents, called emitter readers, are regular executables. Theseexecutables read the data produced by SystemSim, which is packaged indata structures of type EMIT_DATA.

SystemSim produces an EMIT_DATA structure for each event of interestin the order in which the events occurred. This sequence starts with aheader EMIT_DATA structure and ends with a footer EMIT_DATA structure.

EMIT_DATA is a union of many C structs, and each struct isassociated with a specific event. The declaration of the EMIT_DATAunion can be viewed in full in the downloadablePDF file  

For example, as shown in this file, the fourth struct in the unionis MMAP_INFO, which contains information about memory-mapped files.It's declaration in emitter_data_t.h is given as;

structmmap_info {
char pathname[MAX_EMITTER_PATHNAME];
uint64 eaddr;
int len;
int prot;
int flags;
int file_offset;
typedef struct mmap_info MMAP_INFO;

The sti_emitter_union_data_t.h header will be included in EMIT_DATAunion if MCONFIG_SPU is defined.This header provides additional datastructures declared in sti_emitter_data_t.h.

These structs contain information related to DMA, pipelineoperation, and SPU statistics. If CONFIG_ENERGY_SIMULATION is defined,three structures will be available that provide information related tosimulated power consumption.This is an exciting topic, but beyond thescope of this article.

Each structure in the EMIT_DATA declaration has a different set offields, but the first field is always EMIT_HEAD.The EMIT_HEAD structprovides basic information about the processing environment and itsdeclaration in emitter_data_t.h is given as

structemit_head {
    uint8 tag; /*Identifies the structure type */
    uint8 cpu; /*Identifies the running CPU */
    uint8 thread; /*Identifies the current thread */
    uint8 mcm;
    uint8 pad[4]; /*Ensure structure is 8-byte aligned */
    uint64 seqid;
    uint64 timestamp;
typedef struct emit_head EMIT_HEAD;

One of the most important fields in EMIT_HEAD is tag, which takesone of the tag values listed in the second column of Table 4.6. Forexample, if tag equals TAG_DISK_REQ, it means the data structurecontains information related to the Disk_Requests event.

The EMIT_HEAD header data isn't related to the HEADER event inEMIT_DATA or the Header_Record event in Table 4.6 earlier . This can beconfusing, but the next subsection clarifies how headers and events areaccessed in code.

Coding Emitter Readers Pt 2: The Emitter API
As explained earlier, the ereader start command associates emitterreaders with a process ID.As SystemSim runs, each reader starts aprocessing cycle that consists of the following stages:

1. Attach to the simulator'sshared memory buffer identified by the process ID.
2. Wait for the simulator toproduce EMIT_DATA objects.
3 . Read the header of eachincoming EMIT_DATA object.
4 . Process the object dependingon the header's tag value.
5. Receive and process furtherEMIT_DATA objects until tag equals TAG_FOOTER.
6 . Detach the reader andterminate operation.

Common EMIT_DATA processing tasks include getting timestamps fromthe events of interest, accessing the program counter to determinewhere an event occurred in code, and printing the results to a fileusing fprintf.

These stages are controlled by functions in emitter_reader.c,located in $SYSTEMSIM_ TOP/emitter. Table 4.7 below lists eachand its corresponding stage.

Table4.7 Functions in the Emitter Reader API

The EMITTER_CONTROLS structure holds the data generated by SystemSimevent processing. This data includes the array of EMIT_DATA objects andan array of emitter readers.

The first function, EMIT_AttachReader, accepts a pointer to thisstructure and the process ID. It returns an integer that uniquelyidentifies the emitter reader.This value is referred to as rdr in thefunctions that follow.

The rest of the Emitter API functions are straightforward, and mostare used to provide access to the array of EMIT_DATA objects.EMIT_AdvanceCurrent increments the array index, EMIT_CurrentEntryreturns the current EMIT_DATA object, EMIT_FirstEntry returns the firstobject in the array, and EMIT_GetEntry returns the EMIT_DATA object ata specified index.

The best way to understand these functions is to use them in code. Listing4.5 downloadable as a PDF file presents an emitter reader thatattaches to the SystemSim process,waits for EMIT_DATA objectscorresponding to SPU memory access, and prints output to a file calledemit.dat.

The relationship between Tcl event configuration, event tags, andEMIT_DATA structures can be confusing, so let's be clear about what'sgoing on:

1) TheTcl script tellsSystemSim to provide data when the Apu_Mem_Read event occurs.

2) The tag corresponding toApu_Mem_Read is TAG_APU_MEMREAD.The switch-case statement examines theheader data of each incoming EMIT_DATA structure to see whether itmatches TAG_APU_MEMREAD.

3) If the header datamatches this tag, the code accesses the value field contained in theapu_mem data structure.This value is written into a file calledemit.dat.

After building the eread executable, place it in


In the console window, transfer the spu_sieve executable to thesimulated Cell, change the executable's permission, and run theexecutable.

If everything works correctly, SystemSim will produce emitterrecords for Apu_Mem_Read events and call on the emitter reader, eread,to process them.This reader creates a file called emit.dat and placesit in the $SYSTEMSIM_TOP/bin directory. If you open this file, youshould find a long series of read values corresponding to the SPU'smemory accesses during the execution of spu_sieve.

The Cell processor is a complex device, and it takes effort to examinehow applications are executed.The SDK provides three tools that makethis process easier: ppu-gdb, spugdb, and the Full-System Simulator(SystemSim).This part in this series has focused on the latter two andhas described the operation of spu-gdb and SystemSim in depth.

spu-gdb is based on the GNU debugger, gdb, and responds to the sameset of commands. It provides capabilities for breakpoints,watchpoints,instruction stepping, and displaying registers and variable values.

Its usage boils down to two steps: run the executable until itreaches a specific section of code, and then examine the processor'sstate. This process may need to be repeated until the nature of the bugis discovered.

The Full-System Simulator is a powerful tool that creates asimulated Cell processor and makes it possible to run and analyzesimulated applications.

In addition to displaying the contents of a processor's registersand memory, it collects performance metrics related to the processor'soperation.The nature of the simulator's data acquisition can becontrolled with checkpoints, triggers, and emitters.

The tools described ere provide a great deal of power, but they'renot particularly easy to use.The next chapter describes the Cell SDKintegrated development environment, which makes it possible to build,debug, and simulate applications using a simpler point-and-clickmethodology using the Cell SDK's IDE.

Next in Part 5: The Cell SDK Eclipse-based IDE.
To read Part 1, go to Introducingthe Cell Processor”
To read Part 2, go to BuildingApplications for the Cell Processor .
To read Part 3, go to Debugging MulticoreCell Applications .

Matthew Scarpino lives in the San Franciso Bay area anddevelops software to interface embedded devices. He has a master'sdegree in electrical engineering and has spent more than a decade insoftware development. His experience includes computing clusters,digital signal processors, microcontrollers and field programmable gatearrays and, of course, the Cell Processor.

This series of articles is reproduced from the book “Programmingthe Cell Processor”, Copyright © 2009, by permission ofPearson Education, Inc.. Written permission from Pearson Education,Inc. is required for all other uses.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.