The basics of programming embedded processors: Part 2 -

The basics of programming embedded processors: Part 2

Where previously in Part1 we covered the basics of program design and analysis and theusefulness of design patterns, in this part in this series of articleswe develop models for programs that are more general than source code.

Why not use the source code directly? First, thereare many different types of source code – assembly languages, C code,and so on – but we can use a single model to describe all of them.Second, oncewe have such a model, we can perform many useful analyses on the modelmore easily than we could on the source code.

Our fundamental model for programs is the control/dataflow graph (CDFG). (We can also model hardware behaviorwiththe CDFG.) As the name implies, the CDFG has constructs that model bothdata operations (arithmetic and other computations) and controloperations (conditionals). Part of the power of the CDFG comes from itscombination of control and data constructs. To understand the CDFG, westart with pure data descriptions and then extend the model to control.

DataFlow Graphs
A data flow graph is a model of a program with noconditionals. In a high-level programming language, a code segment withno conditionals—more precisely, with only one entry and exit point—isknown as a basic block. Figure 5-3below shows a simple basic block. As the C code is executed, wewould enter this basic block at the beginning and execute all thestatements.

w = a + b;
x = a – c;
y = x+ d;
x = a + c;
z= y + e;

Figure 5-3. A basicblock in C

Before we are able to draw the data flow graph forthis code we need to modify it slightly. There are two assignments tothe variable x – it appears twice on the left side of an assignment. Weneed to rewrite the code in single-assignmentform,in which a variable appears only once on the left side.

Since our specification is C code, we assume that thestatements are executed sequentially, so that any use of a variablerefers to its latest assigned value. In this case, x is not reused inthis block (presumably it is used elsewhere), so we just have toeliminate the multiple assignment to x . The result is shown in Figure 5-4 below , where we have usedthe names x1 and x2 to distinguish the separate uses of x.

w = a + b;
x1 = a – c;
y = x1 + d;
x2 = a + c;
z = y + e;

Figure 5-4 . The basic block in single-assignment form .

The single-assignment form is important because itallows us to identify a unique location in the code where each namedlocation is computed. As an introduction to the data flow graph, we usetwo types of nodes in the graph— round nodes denote operators andsquare nodes represent values. The value nodes may be either inputs tothe basic block, such as a and b , or variables assigned to within theblock, such as w and x1 . The data flow graph for our single-assignmentcode is shown in Figure 5-5 below.

The single-assignment form means that thedata flow graph is acyclic—if we assigned to x multiple times, then thesecond assignment would form a cycle in the graph including x and theoperators used to compute x .

Keeping the data flow graph acyclic isimportant in many types of analyses we want to do on the graph. (Ofcourse, it is important to know whether the source code actuallyassigns to a variable multiple times, because some of those assignmentsmay be mistakes. We consider the analysis of source code for proper useof assignments later in this series.)

Figure5-5. An extended data flow graph for our sample block

The data flow graph is generally drawn in the formshown in Figure 5-6 below. Here, the variables are not explicitlyrepresented by nodes. Instead, the edges are labeled with the variablesthey represent. As a result, a variable can be represented by more thanone edge. However, the edges are directed and all the edges for avariable must come from a single source. We use this form for itssimplicity and compactness.

Figure5-6. Standard data flow graph for sample basic block

The data flowgraph for the code makes the order inwhich the operations are performed in the C code much less obvious.This is one of the advantages of the data flow graph. We can use it todetermine feasible reorderings of the operations, which may help us toreduce pipeline or cache conflicts.

We can also useit when the exact order of operationssimply doesn't matter. The data flow graph defines a partial orderingof the operations in the basic block. We must ensure that a value iscomputed before it is used, but generally there are several possibleorderings of evaluating expressions that satisfy this requirement.

Control/DataFlow Graphs
A CDFG uses a data flow graph as an element, adding constructs todescribe control. In a basic CDFG, we have two types of nodes: decisionnodes and data flow nodes.

A data flow nodeencapsulates a complete data flowgraph to represent a basic block. We can use one type of decision nodeto describe all the types of control in a sequential program. (Thejump/branch is, after all, the way we implement all those high-levelcontrol constructs.)

Figure 5-7 below shows a bit of C code with controlconstructs and the CDFG constructed from it. The rectangular nodes inthe graph represent the basic blocks. The basic blocks in the C codehave been represented by function calls for simplicity. Thediamond-shaped nodes represent the conditionals. The node's conditionis given by the label, and the edges are labeled with the possibleoutcomes of evaluating the condition.

Building a CDFGfor a while loop is straightforward,as shown in Figure 5-8 below .The while loop consists of both a test and aloop body, each of which we know how to represent in a CDFG. We canrepresent for loops by remembering that, in C, a for loop is defined interms of a while loop. The following for loop

for (i = 0; i< N; i++) {

is equivalent to

if (cond1)
switch (test1) {
case c1: basic_block_4(); break;
case c2: basic_block_5(); break;
case c3: basic_block_6(): break;


Figure 5-7.C code (above) and its CDFG (below)

while (i<N) {
while (a <b) {
a = Proc1(a,b);
b = proc2(a,b);

C Code

Figure 5-8.C code (above) and its CDFG (below) for a while loop.

Hierarchicalrepresentation. For a complete CDFGmodel, we can use a data flow graph to model each data flow node. Thus,the CDFG is a hierarchical representation—a data flow CDFG can beexpanded to reveal a complete data flow graph. An execution model for aCDFG is very much like the execution of the program it represents. TheCDFG does not require explicit declaration of variables, but we assumethat the implementation has sufficient memory for all the variables .

We can define astate variable that represents aprogram counter in a CPU. (When studying a drawing of a CDFG, a fingerworks well for keeping track of the program counter state.) As weexecute the program, we either execute the data flow node or computethe decision in the decision node and follow the appropriate edge,depending on the type of node the program counter points on.

Even though thedata flow nodes may specify only apartial ordering on the data flow computations, the CDFG is asequential representation of the program. There is only one programcounter in our execution model of the CDFG, and operations are notexecuted in parallel.

The CDFG is notnecessarily tied to high-levellanguage control structures. We can also build a CDFG for an assemblylanguage program. A jump instruction corresponds to a nonlocal edge inthe CDFG. Some architectures, such as ARM and many VLIW processors,support predicated execution of instructions, which may be representedby special constructs in the CDFG.

Assembly andlinking
Assembly and linking are the last steps in the compilation process -theyturn a list of instructions into an image of the program's bits inmemory. In this section, we survey the basic techniques required forassembly linking to help us understand the complete compilation process.

Figure 5-9 below highlights the role of assemblersand linkers in the compilation process. This process is often hiddenfrom us by compilation commands that do everything required to generatean executable program. As the figure shows, most compilers do notdirectly generate machine code, but instead create theinstruction-level program in the form of humanreadable assemblylanguage.

Generatingassembly language rather than binaryinstructions frees the compiler writer from details extraneous to thecompilation process, which include the instruction format as well asthe exact addresses of instructions and data. The assembler's job is totranslate symbolic assembly language statements into bit-levelrepresentations of instructions known as object code .

The assemblertakes care of instruction formats anddoes part of the job of translating labels into addresses. However,since the program may be built from many files, the final steps indetermining the addresses of instructions and data are performed by thelinker, which produces an executablebinary file.That file may not necessarily be located in the CPU's memory, however,unless the linker happens to create the executable directly in RAM. Theprogram that brings the program into memory for execution is called a loader .

Figure5-9. Program generation from compilation through loading.

The simplestform of the assembler assumes that thestarting address of the assembly language program has been specified bythe programmer. The addresses in such a program are known as absoluteaddresses.

However, in manycases, particularly when we arecreating an executable out of several component files, we do not wantto specify the starting addresses for all the modules before assembly.

If we did, wewould have to determine before assemblynot only the length of each program in memory but also the order inwhich they would be linked into the program.

Most assemblerstherefore allow us to use relativeaddresses by specifying at the start of the file that theorigin of the assembly language module is to be computed later.Addresses within the module are then computed relative to the start ofthe module. The linker is then responsible for translating relativeaddresses into absolute addresses.

Assemblers. Whentranslating assembly code into object code, the assembler musttranslate opcodes and format the bits in each instruction, andtranslate labels into addresses. In this section, we review thetranslation of assembly language into binary.

Labels make theassembly process more complex, butthey are the most important abstraction provided by the assembler.Labels let the programmer (a human programmer or a compiler generatingassembly code) avoid worrying about the absolute locations ofinstructions and data. Label processing requires making two passesthrough the assembly source code as follows:

1. The firstpass scans the code to determine theaddress of each label.
2. The second pass assembles the instructions using the label valuescomputed in the first pass.

As shown in Figure5-10 below , the name of eachsymbol and its address is stored in a symbol tablethat is built during the first pass. The symbol table is built byscanning from the first instruction to the last. (For the moment, weassume that we know the absolute address of the first instruction inthe program; we consider the general case later in this series).

During scanning,the current location in memory iskept in a program locationcounter (PLC) . Despite thesimilarity in name to a program counter, the PLC is not used to executethe program, only to assign memory locations to labels. For example,the PLC always makes exactly one pass through the program, whereas theprogram counter makes many passes over code in a loop.

Thus, at thestart of the first pass, the PLC is setto the program's starting address and the assembler looks at the firstline. After examining the line, the assembler updates the PLC to thenext location (since ARM instructions are four bytes long, the PLCwould be incremented by four) and looks at the next instruction.

If theinstruction begins with a label, a new entryis made in the symbol table, which includes the label name and itsvalue. The value of the label is equal to the current value of the PLC.At the end of the first pass, the assembler rewinds to the beginning ofthe assembly language file to make the second pass. During the secondpass, when a label name is found, the label is looked up in the symboltable and its value substituted into the appropriate place in theinstruction.

Figure5-10. Symbol table processing during assembly

But how do weknow the starting value of the PLC? Thesimplest case is absolute addressing. In this case, one of the firststatements in the assembly language program is a pseudo-op thatspecifies the origin of the program, that is, thelocation of the first address in the program. A common name for thispseudo-op (e.g., the one used for the ARM) is the ORG statement


which puts thestart of the program at location 2000.This pseudo-op accomplishes this by setting the PLC's value to itsargument's value, 2000 in this case. Assemblers generally allow aprogram to have many ORG statements in case instructions or data mustbe spread around various spots in memory. Example 5-1 below illustrates theuse of the PLC in generating the symbol table.

Example5-1: Generating a Symbol Table
Let's use thefollowing simple example of ARM assembly code:

 ORG 100 
label1 ADR r4,c
LDR r0,[r4]
label2 ADR r4,d
LDR r1,[r4]
label3 SUB r0,r0,r1

The initial ORGstatement tells us the startingaddress of the program. To begin, let's initialize the symbol table toan empty state and put the PLC at the initial ORG statement.

The PLC valueshown is at the beginning of this step,before we have processed the ORG statement. The ORG tells us to set thePLC value to 100.

To process thenext statement, we move the PLC topoint to the next statement. But because the last statement was apseudo-op that generates no memory values, the PLC value remains at 100.

Because there isa label in this statement, we add itto the symbol table, taking its value from the current PLC value.

To process thenext statement, we advance the PLC topoint to the next line of the program and increment its value by thelength in memory of thelast line, namely, 4.

We continue thisprocess as we scan the program untilwe reach the end, at which the state of the PLC and symbol table are asshown below.

Assemblers allowlabels to be added to the symboltable without occupying space in the program memory. A typical name ofthis pseudo-op is EQU for equate. For example, in the code

 ADD r0,r1,r2 
BAZ SUB r3,r4,#FOO

the EQUpseudo-op adds a label named FOO with thevalue 5 to the symbol table. The value of the BAZ label is the same asif the EQU pseudo-op were not present, since EQU does not advance thePLC. The new label is used in the subsequent SUB instruction as thename for a constant. EQUs can be used to define symbolic values to helpmake the assembly code more structured.

The ARMassembler supports one pseudo-op that isparticular to the ARM instruction set. In other architectures, anaddress would be loaded into a register (e.g., for an indirect access)by reading it from a memory location. ARM does not have an instructionthat can load an effective address, so the assembler supplies the ADRpseudo-op to create the address in the register.

It does so byusing ADD or SUB instructions togenerate the address. The address to be loaded can be registerrelative, program relative, or numeric, but it must assemble to asingle instruction. More complicated address calculations must beexplicitly programmed.

The assemblerproduces an object file that describesthe instructions and data in binary format. A commonly used object fileformat, originally developed for Unix but now used in otherenvironments as well, is known as COFF (common object file format). Theobject file must describe the instructions, data, and any addressinginformation and also usually carries along the symbol table for lateruse in debugging.

Generatingrelative code rather than absolute codeintroduces some new challenges to the assembly language process. Ratherthan using an ORG statement to provide the starting address, theassembly code uses a pseudo-op to indicate that the code is in factrelocatable. (Relative code is the default for both the ARM and SHARCassemblers.)

Similarly, wemust mark the output object file asbeing relative code. We can initialize the PLC to 0 to denote thataddresses are relative to the start of the file. However, when wegenerate code that makes use of those labels, we must be careful, sincewe do not yet know the actual value that must be put into the bits.

We must insteadgenerate relocatable code. We useextra bits in the object file format to mark the relevant fields asrelocatable and then insert the label's relative value into the field.

The linker musttherefore modify the generated code -when it finds a field marked as relative, it uses the absoluteaddresses that it has generated to replace the relative value with acorrect, absolute value for the address. To understand the details ofturning relocatable code into absolute executable code, we mustunderstand the linking process described in the next section.

Linking. Many assembly language programs are written as several smallerpieces rather than as a single large file. Breaking a large programinto smaller files helps delineate program modularity. If the programuses library routines, those will already be preassembled, and assemblylanguage source code for the libraries may not be available forpurchase.

A linker allows a program to bestitched together out of several smaller pieces. The linker operates onthe object files created by the assembler and modifies the assembledcode to make the necessary links between files.

Some labels willbe both defined and used in the samefile. Other labels will be defined in a single file but used elsewhereas illustrated in Figure 5-11 below .The place in the file where a label is defined is known as an entrypoint. The place in the file where the label is used iscalledan external reference .

The main job ofthe loader is to resolve externalreferences based on available entry points. As a result of the need toknow how definitions and references connect, the assembler passes tothe linker not only the object file but also the symbol table. Even ifthe entire symbol table is not kept for later debugging purposes, itmust at least pass the entry points. External references are identifiedin the object code by their relative symbol identifiers.

Figure5-11. External references and entry points.

The linkerproceeds in two phases. First, itdetermines the absolute address of the start of each object file. Theorder in which object files are to be loaded is given by the user,either by specifying parameters when the loader is run or by creating aload map file that gives the order in which files areto be placed in memory.

Given the orderin which files are to be placed inmemory and the length of each object file, it is easy to compute theabsolute starting address of each file. At the start of the secondphase, the loader merges all symbol tables from the object files into asingle, large table. It then edits the object files to change relativeaddresses into absolute addresses.

This istypically performed by having the assemblerwrite extra bits into the object file to identify the instructions andfields that refer to labels. If a label cannot be found in the mergedsymbol table, it is undefined and an error message is sent to the user.

Controllingwhere code modules are loaded into memoryis important in embedded systems. Some data structures andinstructions, such as those used to manage interrupts, must be put atprecise memory locations for them to work. In other cases, differenttypes of memory may be installed at different address ranges. Forexample, if we have EPROM in some locations and DRAM in others, we wantto make sure that locations to be written are put in the DRAM locations.

Workstations andPCs provide dynamicallylinked libraries, and certain sophisticated embeddedcomputingenvironments may provide them as well. Rather than link a separate copyof commonly used routines such as I/O to every executable program onthe system, dynamically linked libraries allow them to be linked in atthe start of program execution.

A brief linkingprocess is run just before executionof the program begins; the dynamic linker uses code libraries to linkin the required routines. This not only saves storage space but alsoallows programs that use those libraries to be easily updated. However,it does introduce a delay before the program starts executing.

Next, in Part 3:Basiccompilation techniques.
To read Part 1, go to “Program design and analysis .

Used with the permissionof the publisher, Newnes/Elsevier,this series of nine articles is based on copyrighted material from “Computersas Components: Principles of Embedded Computer System Design by Wayne Wolf. The bookcan be purchased on line.

Wayne Wolf  is currently the GeorgiaResearch Alliance Eminent Scholar holding the Rhesa “Ray” S. Farmer,Jr., Distinguished Chair in Embedded Computer Systems at Georgia Tech'sSchool of Electrical and Computer Engineering (ECE). Previously aprofessor ofelectrical engineering at Princeton University, he worked at AT&TBell Laboratories. He has served as editor in chief of the ACM Transactionson Embedded Computing and of DesignAutomation for Embedded Systems.

Bruce Powell Douglas, Doing Hard Time: Developing  RealTime Systems with UML. Addison Wesley, 1999.
[Chi94] M.Chiodo, et. al., “Hardware/software codesign of Embedded Systems,”IEEE Micro, 1994

Reader Response

Figure 5-5 is missing an edge. The for/while loop discussion isunclear – no text addresses the while (aOrlando, FL

Thanks for the feedback. I have fixedthe artwork. As to the positioning of the artwork, I will keep that inmind in the future. But different folks have different strokes. Forexample, when Iread an article, I go through and “read” the graphics first to get asense of the flow of the article. On the other hand, the originaltextbook from which thisis derived uses your suggestion. But on-line, the problems and layoutare different and it is necessary for good visual design to space outthe graphics to break up the text into more readable chunks.

In terms of the substance of thearticle series, I look for content that reaches abroad range of readers, from professionals to students learning to beprofessionals and am willing to publish good articles where ever I canfind them.If you have some ideas for articles that would more clearly explainsome topics in this series, please submit a proposal for an article orarticles and I will happily consider it . – Bernard Cole, site editor.

1 thought on “The basics of programming embedded processors: Part 2

  1. Hi,

    I don’t understand how you went from the for loop to its if/else equivalent. I think some more explanation is required.
    Also there are a lot of words that are running together making the document hard to read.


    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.