Data Memory Paging Management, Part 2 - Embedded.com

Data Memory Paging Management, Part 2



View Part One of this article:
January 2000
Data Memory Paging Management, Part 1

This continuation of a January 2000 article explains a method to detect any potential paging errors in assembly programs.

In Part 1 of this article, I described what paging is, why it exists, and how it can become a problem, and then launched into a method for ensuring that it never gets to be a problem. This article picks up where we left off, detailing the method so that it is useful in larger, nontrivial programs.

Bit addressing
Now I’ll add bit addressing to the imaginary processor I described last month because many embedded processors have such instructions, which can even help when paging. We will recognize one more argument type, bit, and add two more pseudo-assembly instructions:

X<-0	bit	  BitNameX<-1	bit	  BitName

BitName is either of the form RegName or Page . For example, CounterReg_P2<0> for the least significant bit of CounterReg_P2 or Page<1> for bit 1 of the current-page register.

The first thing to note is that on most platforms, the bits of a register aren’t accessible unless the register itself is. For such platforms, any reference to bit CounterReg_P2<0> must have page 2 (and only page 2) active, just as for any reference to reg CounterReg_P2 .

Also of note is that this new instruction can be a useful way of manipulating Page . First of all, if the processor you are using has only two pages, you can replace

:A<-X	imm	0Page<-A

with:

X<-0	bit	Page<0>

and you can replace:

A<-X	imm	1Page<-A

with:

X<-1	bit	Page<0>

Using these equivalent replacements, two program words can be replaced with one. I have completed projects in which 1,022 out of 1,024 words of program memory were used, so this reduction can be important. But even more complex processors, such as this one with four pages, can use bit addressing to shave instructions off a program. Consider the situation at WhatNext in Listing 1, which we looked at last month. When the processor reaches WhatNext , it may have come from Label1 (in which case Page is 2) or Label2 (in which case Page is 0). The first thing we want to do when we reach WhatNext is set Page to 2. Given that no other ways to get to WhatNext exist, that part of the code can be shortened:

WhatNext:		                                ;#–#–	X<-1	bit	Page<1>	                ;—#–	A<-X	reg	CounterReg_P2	;—#–...

The only possibilities for Page upon reaching WhatNext are 0 and 2. In either case, setting bit 1 of Page will leave it with the value 2.

Listing 1: Example from Part 1

;—#define DataReg rD#define ChecksumReg_P0 r5#define CounterReg_P2 r0#define MaskReg_P2 r3...Label1:						              ;####	A<-X	imm	2			              ;####	Page<-A				                      ;—#–	A<-X	reg	CounterReg_P2	      ;—#–	A<-A+X	imm	1			              ;—#–	X<-A	reg	CounterReg_P2	      ;—#–	GOTO	imm	WhatNext...WhatNext:					              ;#–#–	A<-X	imm	2			              ;#–#–	Page<-X				;--#-	A<-X	reg	MaskReg_P2		      ;—#–...Label2:						              ;####	A<-X	imm	0					                      ;####	Page<-A				                      ;#---	A<-X	reg	ChecksumReg_P0	      ;#---	A<-A+X	reg	DataReg	                      ;#---	X<-A	reg	ChecksumReg_P0	      ;#---	GOTO	imm	WhatNext


Conditional statements

For most microcontrollers, the only conditional statements are conditional GOTO s ; that is, if a certain condition is true (or false) the processor will GOTO a certain place, and otherwise the processor will continue with the next instruction. We shall add one more instruction to the pseudo-assembly instruction set:

TEST	imm	   AddressLabel

If A<0> is set, the processor will go to AddressLabel . If A<0> is clear, the processor will go to the next instruction in program memory. I know it isn’t a terribly useful instruction, but it is sufficient to explain how to handle conditional statements. The PIC has a conditional “skip” instruction, which is equivalent to a conditional GOTO to two instructions in front of the conditional instruction. I realize that some processors have conditional instructions that aren’t GOTO , but any conditional instructions can be represented by a combination of conditional GOTO s and unconditional instructions.

The essence of the conditional GOTO is that execution may proceed from two different points in the program. Whatever page (or pages) may be active before the TEST may be active at either of these points. Thus, where a # appears before a TEST instruction, there must be a # after the TEST and there must also be a # after the destination. For example:

Test1:					                ;---#	TEST	imm	Tested1		        ;---#...Label2:					                ;—#–	A<-X	reg	CounterReg_P2	;—#–	X<-A	reg	DataReg		        ;—#–Tested1:		                                        ;—##

Logical rule 5 (any page that may be active before a flow-control instruction may be active at the destination(s) — for a review of logical and symbolic rules, see Table 1) is applied to the TEST instruction. The major difference between GOTO and TEST is that TEST has two destinations: the label and the next instruction. Symbolically, where a # appears in a comment before a conditional instruction, so must a # appear in the comment after that instruction and in the comment after the destination label. Symbolic rule 5 is rewritten to include this information:

  • S5: when a # appears in a comment before a GOTO , a # must also appear in the comment after the destination label. When a # appears in a comment before a TEST , a # must also appear in both the comment after the instruction and the comment after the destination label.

Table 1 Logical and symbolic rules from Part 1

Table 1: Logical and symbolic rules from Part 1

Logical Rules

  • L1: variables must be identified and tracked not by ambiguous logical registers, but by unambiguous physical registers
  • L2: wherever a paged variable is referenced, the page associated with the variable must be active—and equally important, all other pages must be inactive
  • L3: an instruction that modifies Page will change the active page
  • L4: any page that may be active before an instruction that does not modify Page and is not a flow-control instruction may be active after that instruction
  • L5: any page that may be active before a flow-control instruction may be active at the destination(s)
  • L6: any page that may be active before encountering a label may be active after the label

Symbolic Rules

  • S1: any variable name given to a register in page 0 must have the suffix _P0 , any variable name given to a register in page 1 must have the suffix _P1 , and so on
  • S2: any instruction that references a variable name with a suffix _P0 must be accompanied with a comment: ;#— . Any instruction that references a variable name with a suffix _P1 must be accompanied with a comment: ;–#— , and so on
  • S3: an instruction that modifies Page has a comment after it that reflects the new value of Page
  • S4: where a # appears in a comment before an instruction that does not modify Page and is not a flow-control instruction, so must a # appear in the comment after it
  • S5: where a # appears in a comment before a GOTO , so must a # appear in the comment after the destination label
  • S6: where a # appears in a comment before a label, so must a # appear in the comment after the label


CALL and RETURN

Add two more instructions to the pseudo-assembly with which we’re working:

CALL          imm          AddressLabel 
RETURN

These instructions are implemented exactly as you would expect. A CALL instruction will push the address of the next instruction onto a stack, and execution will continue from the label. A RETURN instruction will pull an address off the top of the stack and continue executing from there. The behavior of the RETURN instruction when the stack is empty may differ from one processor to the next, as may the behavior of the CALL instruction when the stack is full. So for simplicity I’ll assume that these situations never occur. I shall also assume that the stack is never modified except for the CALL and RETURN instructions themselves.

In any program, every RETURN is associated with a set of CALL s. A RETURN is associated with a CALL if and only if there is a path of execution from the CALL to the RETURN with the following conditions: an equal number of CALL s and RETURN s are along the path, and at no point along the path are there more RETURN s than CALL s. This rule is exactly the association any programmer would make between CALL s and RETURN s, and isn’t as complicated as it sounds (at least not in a well structured program). Imagine code like the following:

SubA:                                   ; Point A...                                            ; Point B	RETURN...SubB:                                  ; Point C...                                            ; Point D	CALL	imm             SubA                                            ; Point E...                                            ; Point F	RETURN...SubC:                                  ; Point G...                                           ; Point H	CALL	imm            SubB                                           ; Point I...                                           ; Point J	RETURN...                                           ; Point K	CALL	imm            SubC                                           ; Point L

Consider the path of execution from point K. You can assume that the “…” parts of the program can be safely ignored. (They are unconditional pieces of code, they don’t have CALL s or RETURN s, they don’t have GOTO s or TEST s to anywhere outside of the “…” itself, and so on). You will find that the path of execution follows the points in the order K-G-H-C-D-A-B-E-F-I-J-L.

We will focus on the RETURN just after point F (as any programmer would note, the RETURN from SubB ). There is a CALL just after point D, and there is a path of execution from this CALL to point F (A-B-E-F). But examining this path of execution shows there is a point at which there are more RETURN s than CALL s (one RETURN , no CALL s), and that is point E. Thus, the CALL at point D isn’t associated with this RETURN .

Now consider the CALL just after point K: that path of execution gets to point F (G-H-C-D-A-B-E-F), but along this path there are two more CALL s and only one RETURN . Thus, the CALL just after point F is not associated with this RETURN either.

Finally, consider the CALL just after point H. This path of execution (C-D-A-B-E-F) goes through one more CALL and one RETURN before it gets to point F. Thus, theCALL at point H is associated with the RETURN after point F. Neither of the other CALL s are associated with this RETURN . The RETURN at point F is essentially a RETURN from SubB , and the only CALL to SubB is the one at point H.

Subroutines
As far as paging is concerned, every CALL must be treated as a GOTO . Every RETURN must be considered a GOTO to the instruction after any (and all) associated CALL s (that is, a RETURN may have many destinations, in the same way that a TEST has two destinations). For example, a subroutine might exist to add a data byte to a checksum. Given declarations that I previously discussed, such a subroutine might be written:

UpdateChecksum:				           ;####	A<-X	imm	0			           ;####	Page<-A				                   ;#---	A<-X	reg	CheckSumReg_P0	   ;#---	A<-A+X	reg	DataReg			   ;#---	X<-A	reg	CheckSumReg_P0	   ;#---	RETURN

We write the subroutine so that it doesn’t matter what page is active when the subroutine is CALL ed. If there are several CALL s to UpdateChecksum , the code could look like this:

...	;---#	CALL   imm	UpdateChecksum	;#---...	;-#-#	CALL   imm	UpdateChecksum	;#---...

We can observe that these code fragments all work together. When there is a # in the comment before a CALL , there is a # in the comment after the destination. When there is a # in the comment before a RETURN , there is a # in the comment after the associated CALL s. So let’s now add CALL and RETURN to our collection of logical rules:

  • L7: a CALL is to be treated as a GOTO , as far as paging is concerned
  • L8: a RETURN must be treated as a GOTO with multiple destinations, those destinations being any and all instructions immediately after an associated CALL

These logical rules translate to the symbolic rules:

  • S7: when a # appears in the comment before a CALL , a # must appear in the comment after the destination
  • S8: when a # appears in the comment before a RETURN , a # must appear in the comment after any and all associated CALL s

However, an even simpler subroutine shows that the tools described so far aren’t always adequate for creating the shortest and best solutions to paging tracking:

IncRegB:	A<-X          reg	        rB	A<-A+X     imm	1	X<-A          reg	r       B	RETURN

The program may have two calls to IncRegB :

...	;—#–	CALL	imm		IncRegB	; Point A...	;—#–	CALL	imm		IncRegB	; Point B	A<-X	reg		CounterReg_P2...

The best way to complete paging comments for this program would be to start with the CALL s. Assuming that the line before the label IncRegB is GOTO or RETURN (that is, the flow of control to IncRegB never comes from the instruction immediately before), and there are no other references to IncRegB in the program, the subroutine should be written:

IncRegB:                               ;—#–	A<-X	reg     rB        ;—#–	A<-A+X	imm   1          ;—#–	X<-A	reg     rB        ;—#–	RETURN

So far, there is no problem. The register rB is unpaged, so we know the paging is acceptable for this part of the program. Next, we handle the RETURN . Under the conditions that have been described for this program, we know that the only CALL s associated with the RETURN from IncRegB are the two CALL s shown. Thus, the RETURN must be considered as a GOTO to points A and B . The rest of the code should be written:

...	;—#–	CALL	imm     IncRegB	; Point A	;—#–...	;—#–	CALL	imm	     IncRegB	; Point B	;—#–	A<-X reg	             CounterReg_P2...

Imagine now that the program is modified so the first CALL to IncRegB occurs when the active page is 1. Note that a change in one part of a program will affect other, related parts of the program. A programmer following this discipline must be diligent and thorough in tracking down all side effects of every change made to a program; otherwise, bugs will creep in undetected. The new code looks like this:

...	;–#—	CALL	imm		IncRegB	; Point A...	;—#–	CALL	imm		IncRegB	; Point B	A<-X	reg		CounterReg_P2...

The first step again is to consider the CALL s. The subroutine IncRegB should now be written:

IncRegB:                            ;-##-	A<-X	reg     rB     ;-##-	A<-A+X	imm   1       ;-##-	X<-A       reg     rB      ;-##-	RETURN

Again, register rB is unpaged, so this code will operate correctly. Now consider the RETURN — as before, this RETURN is only associated with the CALL s shown. Therefore, the code should be written:

...	;–#—	CALL    imm     IncRegB	; Point A	;–##–...	;—#–	CALL    imm     IncRegB	; Point B	;–##–	A<-X   imm	2                           ;–##–	Page<-A                                       ;—#–	A<-X	reg	CounterReg_P2    ;—#–...

You probably anticipated this scenario. Clearly, the subroutine doesn’t alter the paging, but the comment after a CALL is different from the comment before. This setup can lead to extra instructions for setting the page where they aren’t necessary. The only purpose of the A<-X imm 2 and Page<-A instructions is to keep the comments in line. “This is unacceptable,” I hear you say, “adding instructions to a program that aren’t necessary for the correct operation of the program.” I quite agree.

The next step is to introduce a little higher-level logic, and a concept called “dynamic scope.” The trick I am about to show is a new symbolic notation, so the previously written symbolic rules 7 and 8 will be superseded. However, logical rules 7 and 8 are still adhered to. I shall rewrite the subroutine and code as follows:

IncRegB:                                 ;{Page = P’}	A<-X	reg     rB		;{Page = P’}	A<-A+X	imm   1		;{Page = P’}	X<-A	reg     rB          ;{Page = P’}	RETURN...	;–#—	;Begin dynamic scope of P’	;{Page = P’ = 1}	CALL	imm   IncRegB	; Point A	;{Page = P’ = 1} 	;End dynamic scope of P’;–#—...	;—#–	;Begin dynamic scope of P’	;{Page = P’ = 2}	CALL	imm   IncRegB	; Point B	;{Page = P’ = 2} 	;End dynamic scope of P’	A<-X	reg	CounterReg_P2
;—#–...

The trick here is to introduce a symbol, P’ , which represents the current page at some time during the execution of the program. P’ is not only different in different places in the program, but different in the same place (in the subroutine IncRegB ) at different times during the execution of the program. This is why the scope is called dynamic . Before CALL ing IncRegB , we assert the beginning of the dynamic scope of P’ . Then we define P’ to be the current page. We CALL IncRegB with {Page = P’} . IncRegB doesn’t change Page , so after RETURN ing, we still have {Page = P’} . P’ contains the value of Page before the CALL , P’ doesn’t change during the subroutine, and Page is P’ after the RETURN , so we can assert that Page after the RETURN is the same as Page before the CALL . Certain precautions must be taken: the flow of control must encounter Begin dynamic scope of P’ before it encounters End dynamic scope of P’ , and between any two Begin dynamic scope of P’s in any possible flow of control, it must encounter an End dynamic scope of P’ .

Continue to the second half of this article

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.