
Programmer's Toolbox
by Jack W. Crenshaw
Still More on Interpreters
Last month, I was reminiscing about the fun I had, back around 1978, exploring the innards of my old Radio Shack TRS-80.
Reminiscing is fine, when done sparingly and with class, but you may have been wondering if I would ever get to the point. This month and next, I hope youll begin to see my point, which is: all things are interconnected, and my own wanderings through the TRS-80 software have important relationships with other efforts that were taking place around the same time. Ultimately, they lead to the use of interpreters in real-time programs, and thats what this whole series is about.
JUMP TABLES AND STUFF
Those of you who were with me last month will recall my glee at dissassembling the Tandy ROMs containing Level I BASIC, which was essentially Li-Chen Wangs Palo Alto Tiny BASIC, with floating-point routines added by Tandys programmers.
To get the proper perspective on my attitude in those days, you must remember that I spent my early, formative years dealing with batch-oriented mainframes, where I was at the mercy of systems administrators and the computer operators. I could regale you
with horror stories that would make your hair stand on end, such as the computer decks that came back shuffled and/or shredded, or the decks that were rejected because they had too many rubber bands on them, or the ones that were rejected because they didnt have enough. But my main concern was the fact that my programs were at the mercy of operating system software that was buggy and poorly understood, even by the people who were suppposed to maintain them (is this beginning to sound familiar?). To
give an example, the average production release of OS/360 had 4,000 known bugs in it at the time of release. One version had over 10,000. (By strange coincidence, Windows 95 has about the same number of errors.) To get anything beyond the simplest application to run properly, you had to go to a systems programming guru and get him or her to pass down a handful of the most obscure JCL commands that would, hopefully, persuade the OS to be nice to you. Then you crossed your fingers and prayed a lot.
Because of
my experiences in those not-so-good-old days, my dream for my personal computer was to have a system over which I had total controlwith no software (or hardware either, for that matter) inside it that I either hadnt developed myself, or thoroughly examined and understood. Accordingly, after disassembling every byte of software in the ROMs and marveling at the work of Dr. Wang, I set out to write some software of my own. I first wrote things like device drivers for the CRT and keyboard (the Level
I driver didnt support N-key rollover). I even tried my hand at improving on Wangs software for the Exatron Stringy Floppy (I couldntI got close to his data rate, but never equaled it). Then I began writing my own system tools.
As is the case, I suspect, with many folks, my first real production software, beyond what I wrote for customers, was a hex debugger. I had examined the code inside the other debuggers in my possession, and found that they all had similar
structures. Like any interactive program, the debugger spends most of its time in a wait loop, waiting for input from the user. All debuggers then (and many good ones today) used single-character instructions, so the general structure was something like the following snippet of pseudocode:
while(1){
c = inkey( );
if(c != 0){
- do something
}
}
Almost every debugger I examined used the equivalent of a case table for the do something part:
if c = D
dump
else if c = E
examine
else if c = R
display registers
...
endif
The notion of testing for each character, one at a time, didnt appeal to me. I thought I had a better idea: use a jump table. I reasoned that, after all, our alphabet only has 26 letters. At most, a jump table can have only 26 entries, giving 52 bytes in a 16-bit address space. That seemed to me a size I could live with. Even if we dont use all 26
letters, as is usually the case, were better off just padding the unused entries with some innocuous entry that either does nothing at all, or issues some kind of simple error message, like What? My debugger, in fact, stole the character string from the Level I ROMs to write just that message. I can just about write the main loop of the debugger from memory. Its shown in
Listing 1. Its in 8080 assembly language, which not many folks can read anymore, but
the comments tell the story pretty well.
A few points are noteworthy. First of all, note that I always reset the stack pointer at the beginning of each loop. This isnt really necessary, but it adds a bit of security. A debugger, by its very nature, is likely to be used to test buggy code. If things get out of hand, the first line at least gives you a fighting chance to get things cleaned back up again. Second, youll see that I simulate a function call by pushing the address I want to come back
to (loop:), and later jumping to the target function. The nice part about this approach is that you can write the code that deals with each command as a subroutine, with a simple RET at the end. If all has gone well, that RET will pop the address of the loop again, and were back for the next round.
Note that I use the single-letter command as the index into a jump table. The function getupper fetches a single character from the console and converts it to upper case. It also returns an error
message by setting the carry flag if the character was not a..z or A..Z.
Once the character is qualified as a legal command, we convert it to an index into a jump table. Adding the address to the table then gives us an indirect address to jump to. The last step is to dereference the pointer to get the actual address, and jump to it.
Not shown in the code are routines to fetch hex data from the command line for those (almost all) functions that required such data. I had functions to read one,
two, or three words, and flush the rest of the command line to get rid of any extraneous garbage. These functions were called by the command functions, since they were the ones that knew how many arguments were needed.
For the record, the entire main program takes a grand total of 27 bytes. Even allowing for the error handler, and the jump table itself, its easily under 100 bytes. The overall size depends on how fancy we get in the command functions, but my debuggers, which are
rather simple, tend to run about one kilobyte overall.
8080 assembly code may produce a feeling of nostalgia, but many of you may identify better with C code. The equivalent C code is shown in
Listing 2.
from debuggers to translators
As so often happens, it was along about here, circa 1978, that a number of related ideas came into some kind of cosmic convergence. One of these ideas came from the debugger. As I was tinkering with it and contemplating its general
structure, it occurred to me that there wasnt much difference between the debugger and an interpreter. The difference was mainly that of getting the commands from memory, rather than from the console. My thoughts began to wander back to the good old days of simple interpreters, and I realized that, with only a little more effort, I could write an interpreter of my own using a similar structure. I had really liked the Recomp III approach of having characters with lots of mnemonic significance, such as +, -,
*, and /, for the operations. I set out to define the commands that my dreamed-for interpreter would use.
At this point, I must admit that I ran into a little trouble, which is probably one reason why the interpreter was never written. The problem wasnt a serious one, but simply that I ran out of characters with lots of mnemonic significance. The algebraic operators are pretty obvious choices, as are the equivalent logical operators, &, |, and so on. I could have used
characters like ! or . for output, and i for input. But what do you use for call and return, or push and pop? Each time I sat down to define the complete set of operators, I found myself running out of useful characters.
Someone pointed out to me that, of course, I neednt limit myself to readable characters with mnemonic significance. If I gave up on the notion of readable characters, I could have lots more
commandsa total of 256 without straying from the single-byte format. But I couldnt shake the notion that readable characters would be much better, because this would mean that I wouldnt need an assemblerthe code could be readable as plain text.
As for the arithmetic operators and how to use them, I didnt have to make many decisions there. By this time I had already been exposed to reverse Polish notation, thanks to both a fellow grad student who had explained it to me,
and the beloved Hewlett-Packard HP-35 that gave me my first practical lesson in the beauties of stack machines and reverse Polish notationa lesson I took thoroughly to heart. I had no doubt that any interpreter I wrote would be based on stack-oriented operations.
About the same time, the third element of the convergence came along. At the time, I was teaching an adult-education course at night, at the local branch university, on personal computers. I took my TRS-80 over in the car every class
evening. During the course of the class, I met a fellow instructor who was a NASA software engineer. As it happens, he was a whiz at compiler theory, and had developed a wonderful little subset/superset of Fortran, which ran interactively on a time-share machine in a fashion similar to Dartmouth BASIC. Hed added many neat extensions to Fortran, including built-in array arithmetica feature Ive longed for ever since, until the advent of C++s operator overloading.
This fellow told me that
he was thinking of trying to write a compiler for the TRS-80a subset of his subset of Fortran. I told him of my idea of a single-character, stack-oriented interpreter, and he said, What you just described is the intermediate language for a compiler. It was the first time Id heard the term, but hardly the last. He explained the entire concept of intermediate languages and the abstract machines that they represent.
This notion of abstract machines is an extremely powerful one, and
one that lies at the heart of every translator and compiler. Ive seen a lot of definitions of the term, but the one I like best is also the simplest and most easily understood: given a processor whose native language you dont particularly like, you create an imaginary computer, one that you would rather have. This is the abstract machine. Then you write a small interpreter that makes the real machine look and act like the abstract one. This is the interpreter, or the intermediate language,
depending on your point of view. Because an abstract machine can be made to be very regular and simple, compiler writers find that its much easier to generate code for the abstract machine, rather than the real one. For this reason, almost every compiler ever written uses an intermediate language of some sort (some use two or more). The front end/parser of the language typically generates code for the abstract machine. This is then translated one more time into the true, native machine language.
Alternatively, one can leave the compiled code in its abstract machine form and write an interpreter to execute the code for this abstract machine at run time. This is the concept of P-code that was made popular by Nicklaus Wirth and his Pascal compilers. As well see later, the same concept has been used by many others, including Microsoft in their modern Visual BASIC.
In any case, my colleague and I realized that, between the two of us, we made one pretty fair compiler writer. He offered to write the front
end of a compiler for the old TRS-80, and I was going to write the interpreter for the intermediate language. Wed use the P-code concept; there would be no final translation to machine language. Instead, the interpreter would process the P-code on the fly.
As it turns out, we never wrote that language for a number of reasons, not the least of which is that I took another job out of state. Nevertheless, I continued to be thoroughly convinced of the value of the single-character,
interpretive language, and I still toyed with the various features I might want to see in it.
WHAT WERE OTHERS DOING?
During this historical account, Im sure you might have noticed that it was very much oriented towards what I was doing, and Ive said little or nothing about what other folks in the fieldfolks much more skilled than Iwere doing. This is pretty much the way it was in those days. I was too busy inventing things to notice that others had already invented them, and done
a much better job. Since then, Ive mellowed a bit and learned to raise my head up from time to time and see what other folks are doing. Next month, and in subsequent months, well talk about what some of those other folks were doing, and how their contributions fit into the overall picture. Well talk about UCSD Pascal and Forth, and the various aspects of threaded codes of all flavors. Well also talk about the incredibly clever contribution from a fellow named Wozniak.
See you
then.
Jack W. Crenshaw is a staff scientist at Invivo Research. He did much early work in the space program and has developed many analysis and real-time programs. He can be reached via e-mail at
jcrens@earthlink.net.
|