Data Memory Paging Management - Embedded.com

Data Memory Paging Management


View Part 2 of this article (February 2000):
Data Memory Paging Management, Part 2

Mapping logical registers to physical registers requires the management of paged data memory. This two-part article explains a method to detect any potential paging errors
in assembly programs.

Something is wrong, and yet nothing is wrong. I’ve discovered the counter isn’t counting. I have the bug traced down to one line, which I know is executing:

INCF counter,f

The register refuses to change value, no matter how long or how hard I stare at the instruction. I check the top of the file for the line:

counter equ h’20’

I check the data book for the syntax of the instruction. I check that the object file has the correct binary values for opcode and parameter. Everything seems correct, but the program does not work. Disbelief, confusion, and even despair attack me. Finally, I painstakingly trace the flow of execution of the program, and discover that the data memory page isn’t set correctly when the instruction is executed. After a day of effort, this little piece of information leads me to add just one more instruction, and the entire program works. I simultaneously feel frustration that such a small obstacle could have stymied me for so long and relief that the program is finally working properly.

This kind of scenario used to happen to me, until I devised a method to detect any potential paging errors in the assembly programs I write. This two-part article describes the method I use, which is applicable to any processor that has paged data memory. Thefirst part explains what paging is, why it’s implemented on processors, howproblems can arise, and the essence ofthe method for preventing such bugs.The second part adds details so thatyou can use the method to write non-trivialassembly language programs.Anyone who has suffered from a bugdue to a paging error may find thatpart interesting and useful. I still occasionallymake an error in paging, butmuch less frequently; and when it doeshappen, I find the error far morequickly than before.

Why paging?

To begin, I think an explanation is in order as to why paging exists in the first place. The reasons go back to processor design. One of the key decisions in designing a processor is what instruction set to give it. An instruction set is essentially a set of rules that defines what the processor does for each combination of values stored in program memory.

Designers want to make the instruction set as small as possible, to most efficiently use the limited program memory. One subset of this instruction set may add the contents of a register to an accumulator, that is, the set of instructions {add register 0 to accumulator, add register 1 to accumulator,  …} . The instruction set is limited, which means this subset is limited, so only a limited number of registers can be added to the accumulator at any time. For example, if 16 values in the instruction set correspond to this add instruction subset, then any one of 16 registers may be added to the accumulator. But a processor may have more registers — perhaps 64 of them. Naturally, you’d wish to be able to add any register to the accumulator.

You can do so by using a subtle trick. The set of add instructions is redefined as {add logical register 0 to accumulator, add logical register 1 to accumulator,…} . It remains true that under a fixed set of circumstances, an add instruction can only reference one of 16 registers. The trick is to have different sets of circumstances, so that a logical register does not always reference the same physical register. If you have four different sets of circumstances, the add instruction could be used to access any one of the 64 physical registers. For instance, under one set of circumstances, logical register 0 references physical register 0, logical register 1 references physical register 1, and so on, through logical register 15 references physical register 15. Under a second set of circumstances, logical register 0 references physical register 16, logical register 1 references physical register 17, and so on, through logical register 15 references physical register 31. Under a third set of circumstances, logical registers 0 to 15 reference physical registers 32 to 47. Under the fourth set of circumstances, logical registers 0 to 15 reference physical registers 48 to 63. In this way, any one of 64 physical registers may be referenced while using an instruction (sub)set a quarter of the size that would be needed if paging weren’t used. The set of circumstances that determines how the logical registers map to physical registers is called the current page setting, or the active page .

This economy of memory, however, can lead to tricky bugs arising in low-level code.

Pseudocode

Assembly languages and paging mechanisms can vary significantly among processors, so I use pseudocode in my examples. I shall fully define an imaginary processor’s instruction set and paging, so no one is at a disadvantage for being unfamiliar with a given platform. The methods can easily be translated to many real assembly languages.

The pseudocode I use has one accumulator, labeled A . The assembly instructions all have at most one argument, labeled X , which is a register (reg ) or an immediate value (imm ). Anyone who has done extensive programming would recognize a table with mnemonics on one side and expressions such as A<-A+X on the other. Instead of using mnemonics, I shall use the expressions themselves. A single pseudo-assembly instruction is of the form: expression, argument type, argument . Thus, to add a register named AddMe to the accumulator, the instruction is:

A<-A+X   reg     AddMe

The X in the expression stands for the argument. The reg indicates that the argument is a register, and AddMe is the argument itself. To increment A , the pseudo-assembly instruction is:

A<-A+X   imm1

A is the destination of all such “common” expressions except one:

X<-AregRegName

Two “uncommon” expressions exist:

Page<-A
A<-Page

The first sets the active page to A . The second puts a value in A corresponding to the current active page.

There is one more instruction:

GOTO     imm     AddressLabel

Address labels occur at the beginning of lines, with a colon after them:

AddressLabel:

(I’ll discuss the pseudocode for bit addressing, conditional statements, and subroutine calls later.)

The argument in the GOTO instruction must always be a label, and a label cannot be on the same line as an instruction. The semicolon (; ) is the symbol used for comments. Wherever a semicolon symbol appears on a line, it is ignored, along with the rest of the line.

There are 16 logical registers, 46 physical registers, and four pages in this (imaginary) processor. The 16 logical registers are named r0 to rF. Forty of the physical registers are paged. They are in four pages of 10 registers each, labelled p00 to p09 for page 0, p10 to p19 for page 1, p20 to p29 for page 2, and p30 to p39 for page 3. Six physical registers are unpaged, and these are labelled puA to puF. Register puA is the accumulator. Logical registers rA to rF always refer to the unpaged physical registers puA to puF. Logical registers r0 to r9 refer to paged physical registers. When page 0 is active, r0 to r9 refer to p00 to p09; when page 1 is active, r0 to r9 refer to p10 to p19; and so on. All instructions that reference a register (that is, all instructions with reg ) reference by logical register.

Note that this isn’t the only way to manage paging. For some processors, a physical register may map to logical registers in more than one page, but not in all pages, or a physical register may map to one logical register in one page and a different logical register in another page. Different areas of data memory may even have different, independent paging schemes. (What’s worse is that different areas may have different but dependent paging.) I’ve chosen the previous example because it’s rather simple. The methods I describe in this article can be modified to accommodate other paging schemes, if necessary.

Problem

Imagine that register p20 were to be used as a counter, and register p23 were to be used as a mask register. An assembly language programmer might use the following lines:

#define   CounterReg  r0
#define   MaskReg  r3

and code such as the following may appear in the program:

Label1:
A<-X imm 2
Page<-A
A<-X reg CounterReg
A<-A+X imm 1
X<-A reg CounterReg
GOTO imm WhatNext
WhatNext:
A<-X reg MaskReg

Nothing is wrong with this code as it stands. If execution starts from Label1 , then Page is set to 2. CounterReg (r0) refers to p20, as intended. The program goes to WhatNext, Page is still 2, and MaskReg (r3) refers to p23 as intended. If, however, the programmer uses p05 as a checksum holding register, there may be more #defines :

#defineChecksumRegr5
#defineDataRegrD

Elsewhere in the program, unless the programmer is careful, might be another code fragment:

Label2:
A<-X imm 0
Page<-A
A<-X reg ChecksumReg
A<-A+X reg DataReg
X<-A reg ChecksumReg
GOTO imm WhatNext

Neither code fragment alone is wrong. But if they’re in the same program, trouble will arise. When execution starts from Label2, Page is set to 0. References to ChecksumReg (r5) will access p05, as intended. DataReg (rD) will reference puD regardless of the value of Page . After going to WhatNext , with Page still at 0, MaskReg (r3) will not map to the intended physical register (p23) but to a different register (p03). This kind of trouble can be serious and difficult to track down. When an instruction references a paged register, the current page must be set correctly. To check that this is the case, you must follow every possible path of execution the program may take to that instruction and find the last write to Page before the instruction. Checking execution paths from a particular point can be relatively easy, but checking every execution path to a point is usually much trickier.

The fundamental cause of the paging problem is that the assembler will only recognize logical registers, not physical registers, and it does not keep track of possible values held in Page at every instruction that references a paged register. (Actually, expecting an assembler to do so would be unreasonable. The process is quite difficult to automate completely and correctly.) In the above example, the register whose task it is to hold the mask is physical register p23, but it can only be referred to in assembly language as r3, and r3 doesn’t unambiguously reference p23; it could also reference p03, p13, or p33.

Register names and elementary paging tracking

The purpose of this article is to describe a method of checking that each identifier of a paged physical register actually refers to the intended register, before the assembly is performed. The first step in eliminating paging errors is to remove the ambiguity. A register identifier must represent a unique physical register, rather than a logical register. The ambiguity is inherent in the assembler, so the programmer must enforce this rule. Adding a prefix or suffix to a register name to indicate which page it’s in would be a suitable naming convention. Thus, in the previous example, the counter and mask registers may be called CounterReg_P2 and MaskReg_P2 , and the checksum register may be called ChecksumReg_P0 . Note that DataReg needs no suffix because it’s accessible no matter which page is active.

The programmer must take on another task, which is to determine at all points in the program whether any particular page is active and if so, which one. For example, using {} to enclose the relevant information, the first code fragment would be written:

Label1:
; {Page=0,1,2,3}
A<-X imm 2
; {Page=0,1,2,3}
Page<-A
; {Page=2}
A<-X reg CounterReg_P2
; {Page=2}
A<-A+X imm 1
; {Page=2}
X<-A reg CounterReg_P2
; {Page=2}
GOTO imm WhatNext
WhatNext:
; {Page=2}
A<-X reg MaskReg_P2
; {Page=2}

The comments merely reflect the current status of the active page. For most instructions, the comment on the line after the instruction is exactly the same as the the comment on the line before. There are two exceptions: instructions that modify Page , and flow-control instructions (such as GOTO ). For instructions that write to Page , the comment on the line after reflects the effect of that write. For GOTO , the comment on the line after the destination (not after the GOTO itself) must accommodate the comment before the GOTO instruction. For labels, the comment after a label must accommodate the comment before (if there is a relevant comment immediately before) and must also accommodate the comments before any GOTO s to that label.

You can see that this code fragment will run correctly because for every paged register referenced, the comment on the line preceeding the instruction matches the page of the register.

Attacking the problem

Now, the second code fragment would be written:

Label2:
; {Page=0,1,2,3}
A<-X imm 0
; {Page=0,1,2,3}
Page<-A
; {Page=0}
A<-X reg ChecksumReg_P0
; {Page=0}
A<-A+X reg DataReg
; {Page=0}
X<-A reg ChecksumReg_P0
; {Page=0}
GOTO imm WhatNext

You can see that the two code fragments, when written like this, will not work together because the comment after the label WhatNext doesn’t accommodate the comment before the GOTO imm WhatNext instruction. (And if the comment after the label WhatNext were to be changed, you can see that the register MaskReg_P2 may be referenced when Page is not 2.) Two ways to modify the code exist such that the two fragments will run together. The problem could be fixed before the GOTO , so the second fragment would be rewritten:

Label2:
; {Page=0,1,2,3}
A<-X imm 0
; {Page=0,1,2,3}
Page<-A
; {Page=0}
A<-X reg ChecksumReg_P0
; {Page=0}
A<-A+X reg DataReg
; {Page=0}
X<-A reg ChecksumReg_P0
; {Page=0}
A<-X imm 2
; {Page=0}
Page<-A
; {Page=2}
GOTO imm WhatNext

Or the problem could be fixed after the GOTO destination, so the first fragment would be rewritten:

Label1:
; {Page=0,1,2,3}
A<-X imm 2
; {Page=0,1,2,3}
Page<-A
; {Page=2}
A<-X reg CounterReg_P2
; {Page=2}
A<-A+X imm 1
; {Page=

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.