CMP EMBEDDED.COM

Login | Register     Welcome Guest IPS  Call for Abstracts
 




Programmer's Toolbox
Interpreters—A Recap

by Jack W. Crenshaw

For those who are joining us late, we’ve been talking about interpreters as used in real-time systems, with major excursions down the side roads of nostalgia. We’ve talked about the very first interpreters, in the ’60s time frame, the first interpreters for microcomputers, circa 1975, and, in a brief mention, the original time-shared Dartmouth BASIC, developed by Kemeny and Kurz.We discussed various efforts to develop languages for microcomputers, including my own daydreams of a do-it-yourself compiler.

Last month, I talked about—for the first and almost the last time—the language I’ll bet you thought I was going to cover for the entire series: Forth. Though I described a very successful use of Forth on a project circa 1979-80, Forth is not so important to me as a language/system/whatever, as are a couple of the key concepts—notably, the idea of an inner interpreter, and the use of reverse Polish notation. A much more interesting interpreter, to me, is one that you’ve probably never heard of. And, as usual, there’s a story behind it.

The Atlantic City convergence

You’ve heard me talk before about what seemed to be some kind of Cosmic Convergence, where the ideas, the means, and the motive to do some great thing all seemed to arrive at the same place, and the same time. Such seemed to be the case when many folks were getting ideas about how to implement programming languages on small computers. I’ve mentioned, in previous columns, my own ideas on the subject, and also those of such superstars as Nicklaus Wirth, designer of Pascal; Chuck Moore, designer of Forth; and Kenneth Bowles of UCSD Pascal fame.

If there was ever such a time and place for yet another Cosmic Convergence, it would have to be in Atlantic City in the summer of 1976.

Once in a very long while, in the course of human events, an event comes along that changes the world forever. One such event was the one we celebrate every July 4th: the day the rebellious American colonies won their independence. I needn’t remind you that the city of Philadelphia played a big role in that struggle for independence.

Appropriately enough, almost exactly 200 years later, another event took place not far away, that was, in its own way, a key event in yet another revolution, another struggle for independence and against oppression. In this case, the oppression came from the mainframe computer, its associated priesthood of systems administrators, systems programmers, and operators. The independence came in the form of the personal computer.

In the two short years between the announcement of the MITS Altair 8800, and that summer of ’76, the microcomputer industry had exploded in a fashion nobody could have possibly predicted. To be perfectly accurate, we must admit that seeds of people’s interest in owning their own computer, and breaking the bonds of mainframe tyranny, had been germinating long before the introduction of the Altair in 1974. Some crazy folks had been collecting bits and pieces of surplus minicomputer hardware, and building computers in their basements and garages, much earlier. Other microprocessor-based designs, and even other computer kits, had been available before the Altair. Nevertheless, its introduction makes as good a milestone as any for the true beginning of the Microcomputer Revolution.

In the intervening years, dozens of other companies had come out with microcomputers, some copies of the Altair (which eventually led to a standardized S-100 bus), some improvements, and many never to be heard of again. Meanwhile, people wanted software. They wanted software that would let the computer do something more useful than display blinking lights or play silly tunes.

A couple of years earlier, Jim Warren, Jr. had started a company called People’s Computer Company, that sold time on a time-share computer terminal to anyone who wanted it. He began a newsletter, which soon developed into Dr. Dobbs’ Journal of Computer Calisthenics and Orthodontia (with subtitle, “Running Light Without Overbyte”). A key thrust of that magazine, almost its entire raison d’etre in those days, was the definition and development of a high-level programming language for micros. The focus eventually narrowed to concentrate on Tiny BASIC, and DDJ spearheaded the efforts to define this language. Tom Pittman’s Tiny BASIC for the Motorola 6800 was one such implementation. Another was Li-Chen Wang’s “Palo Alto” Tiny BASIC for the 8080, which I’ve mentioned before. It remains one of my favorites for elegance of design and construction, and I’ve recounted its evolution through Radio Shack BASIC Level I, to the many real-time BASICs still in use today.

It was Warren who was instrumental in organizing the “hobby convention and exposition” known as Personal Computing ’76 in Atlantic City. Looking back, it’s easy to recognize it as yet another case of Cosmic Convergence, an event almost as earth-shattering, and definitely as life-changing, as the one that had happened 200 years earlier in Philadelphia. I was there. Wandering around aimlessly, footsore and weary, bedazzled and bewildered, perhaps, but there nonetheless. I was faced with so many wonders my head spun. The array of new hardware boggled the mind. In less than two years, RAM spaces had grown from 256 bytes to a staggering 64K. Floppy disks had already arrived, and DOSes were not far behind. New companies, new names, and new faces were everywhere. Like Fourth of July fireworks, most were destined to fade as quickly as they began. A few were destined to survive and flourish. Some were destined to change the world.

Sweet 16

One of the many such wonders sat in a small, innocuous booth off the beaten path: a small, innocuous single-board computer called the Apple I. Developed by a pair of former phone phreaks, two Steves named Wozniak and Jobs, it was based on the recently introduced MOS Technology 6502 CPU, itself a near-knockoff of the Motorola 6800.

A couple of years later, the forlorn, bare-board Apple had evolved into the neatly packaged Apple II. I suppose it’s safe to say the Apple II was a moderate success.

By the time the Apple II was announced, Tiny BASIC was already history, having been judged by most to be too limited for “serious” work. People wanted lots of memory, lots of CPU power, and most of all, ready access to a serious programming language. At MITS (later Microsoft), BASIC had evolved from 4K through 8K and 12K versions, to 16K. It was already a household word, and was destined to be installed into everything from Altairs to TRS-80s to Apples. But the Apple II version of Microsoft BASIC wasn’t ready when the hardware was, and the hardware was ready to ship. Like Tandy before them, Apple needed an interim BASIC. So Woz wrote one. It was an integer-only BASIC, using 16-bit integers.

Now, some say that there are two kinds of people in the world: those who divide all the world into two kinds of people, and those who don’t. Of the former set, at least a few of us divide the world into those who use “80” computers (8008, 8080, Z80, 8086, 80x86, and so on) and those who use “68” computers (6800, 680x0, and the “nearly 68” oddball, the 6502). I freely admit, I’ve been an 80 most of my life, and still work with 80 processors on the job. (In my heart, for what it’s worth, I made the switch to a confirmed 68 after one look at the horrors of the 8086 architecture. If I’d had my druthers, we’d all be running Windows on 68060s by now. One does what one must to make a living, however).

To me, the 8080 and Z80 always seemed to be superior chips to the 6800 and 6502. The 8080 had seven registers to the 6800’s two (plus two index registers). The Z80 added another seven, plus two more index registers. Nevertheless, I can’t deny that, benchmark after benchmark, BASIC interpreters based on the 68s consistently outperformed those for the 80s.

The biggest problem with the 68s was that they had no 16-bit arithmetic. Though the 8080 and Z80 were basically 8-bit processors, at least they had 16-bit registers (three for the 8080, eight for the Z80), and you could actually perform arithmetic in them, shift them, test them, and so on. You couldn’t do any of these things with the 6800 or 6502, which is one reason I still don’t understand, to this day, how the 68s could outperform the 80s in benchmarks.

In any case, Wozniak knew he needed an integer BASIC. He also knew that nobody in their right mind was going to pay good money for a language that could only operate on numbers smaller than w128. To be at all successful, commercially, the language had to support a full range of 16-bit arithmetic, but the processor he was using didn’t do 16-bit arithmetic. Solution: implement the arithmetic in software.

Steve’s solution was the same one we’ve discussed before: if the processor won’t do what you need done, define the processor you wish you had, and emulate that one on the real processor. In short, use an abstract machine. The Woz called his emulator Sweet 16, and it was an integral part of the Apple II integer BASIC. If you were an Apple programmer, you probably remember it, because Apple published a detailed interface spec for Sweet 16, and invited programmers to use it. Others, however, probably never heard of it, since it never saw the light of day outside of the BASIC interpreter.

If you were going to implement an arithmetic package in a microcomputer, how would you do it? Probably write a package of subroutines, right? They could be called from within the interpreter, and also from within other application programs. That’s the same approach we used to implement the longlong integer arithmetic a few months ago. It’s also the way most folks implement floating-point software.

The only problem with this approach is subroutine calls and returns are expensive, both in code size for the call, and in clock cycles.

On the other hand, if you’re convinced that the native machine language of the processor doesn’t meet your needs, and you must have an interpreter, it would be very tempting to write everything in that interpreter. This is the approach used by UCSD in their Pascal compiler, and Chuck Moore in Forth. Recall that UCSD Pascal used a single-byte opcode, even for rather complex operations, which makes for very small code size. If the opcode is only a single byte, you can pack a lot of code into 1K. The only problem with this approach is that it’s too slow; we’ve traded code size for speed. The Woz needed software that was both small and fast.

His solution was, in this writer’s opinion, masterful, and deserves to be recognized right up there among the great ideas of all time. Woz’s unique contribution was to recognize that just because you have an abstract machine and an inner interpreter to implement it, there’s no law that says you must use it all the time .

Apple integer BASIC was written mostly in fast, efficient assembly language. But—and this is the key, brilliant idea—you could drop into and out of Sweet 16 at will. That is, Woz’s design allowed the programmer to switch from assembly language to Sweet 16 and back, with almost no overhead. In effect, Woz created a math coprocessor in software.

How’d he do that?

Here’s the general idea: when you call any subroutine in microprocessor assembly language, the CPU pushes the address of the next instruction—that is, the return address—onto the stack. Under normal circumstances, the subroutine does its thing until it reaches the final RETURN instruction. That instruction pops the return address back off of the stack, sticks it in the program counter, and execution takes up where it left off.

Suppose, however, that the thing you call is an inner interpreter like Sweet 16. Because it’s an interpreter, Sweet 16 needs two things: it needs instructions in whatever the language is of the abstract machine it’s emulating, and it needs to know where to go to begin execution. In Wozniak’s implementation, there is no separate place to store the Sweet 16 code—no dictionary, no separate program, no nothing. The abstract machine code simply follows in-line, right after the call to Sweet 16. This means that the place for the abstract machine to begin execution is the same address it would be if it were assembly code: it’s the return address that has just been pushed onto the stack. Sweet 16 simply pops that address into its own simulated program counter, and takes up execution right where the assembly language left off.

Coming out of emulation mode is just as easy. One of the opcodes we must define for the emulator is a RETURN or EXIT instruction. Upon encountering this instruction, the emulator simply takes its current instruction pointer and sticks it into the CPU’s program counter. If the instruction pointer has been properly incremented as Sweet 16 was running, the CPU will take up executing assembly language again, just as though it had never stopped. Listing 1 shows a typical mix of assembly and interpretive instructions. Note the similarity between this code and that for software involving a coprocessor.

As in the case of UCSD Pascal, as well as my dream interpreter/compiler, Woz used a single byte for each opcode. Of course, in practice we can’t really write assembly code that looks like Listing 1; the assembler would barf on the Sweet 16 opcodes. However, this is a very minor problem. Merely define each opcode as a constant, and use a DB (or whatever the equivalent is for your assembler) to express it. Thus, to make the code in Listing 1 readable by our assembler, we should write the Sweet 16 opcodes:

db PUSH

dw 123

db PUSH

dw 456

db ADD

etc.

Naturally, to implement such an interpreter, we must define what the opcodes to the abstract machine will be, and what action they’re supposed to cause. In short, we must define the abstract machine. Woz defined operations that he needed to do 16-bit arithmetic. You can define them to be whatever you need. The nicest part of this architecture is that you can adapt it to whatever problem on which you happen to be working. Many writers of text editors use this approach, making the abstract machine a text processing engine. Still others let it do graphics for computer games.

It’s very important to note that the interaction between the real CPU’s registers and the abstract machine registers is entirely up to the programmer. If you want to keep the abstract machine’s data structures completely separate from the CPU’s registers, you may do so (and that’s a good idea). The only data structure they really need to share is the return stack, and that only for starting and ending the interpreter’s function. If you choose to have, for example, 32 64-bit registers in the abstract machine, you are perfectly free to do so.

It’s also worth noting that if Sweet 16 itself is written to be reentrant, we need not limit ourselves to one level of subroutine nesting. It’s perfectly feasible to have an assembly program that calls Sweet 16, which goes into assembly again, which calls another instance of Sweet 16, and so on.

Could Wozniak’s Sweet 16 do this? I honestly don’t know; I haven’t scanned his code (I told you I was an 80 person), but it doesn’t really matter. The details of Woz’s implementation are, to me, not nearly as exciting as the concept of mixing assembly (or other languages) with interpreted code.

Moving on

Moving closer to modern times, around 1986 I bought myself a hobby computer based on a Motorola 68000. I did this deliberately, because though I use PCs for all my daily work, I absolutely detest the architecture of the 80x86. I used to do Z80 programming for fun, sort of like working a crossword puzzle. My first assembly language code (aside from college classes) was a floating-point package for the Intel 8008. I didn’t even own an 8008. I just wanted to see if I could write the software, and how difficult it would be.

I frankly couldn’t imagine anyone writing assembly language for a PC for fun. So I got a PC for work, and a 68000 system to play with. One of the first programs I tried in assembly language (the only language available for that system) was an implementation of a stack-oriented interpreter. Except for the levels of indirection (my interpreter used direct, not indirect, threaded code), it was much like Forth. However, in honor of Wozniak’s great idea, I named it Sweet. I won’t bore you with the code—most folks nowadays don’t care about 68000 assembly language anyhow, but it was easy to write. The individual operators—stack-based—were also easy. As I recall, I didn’t use the two stacks of Forth; I found the single CPU return stack to be ample for my needs.

Sweet worked like a charm, it was far smaller than 1K in size, and if I ever had doubts about the efficacy of interpreters, they were erased the first time I saw Sweet run. Those of you who work mostly with GUI systems, like the Macintosh or Windows, may have forgotten just how fast CPUs—even relatively modest ones—can run. Sometimes it helps to remember that, when they’re not encumbered by such duties as responding to mouse motions, clicks, and so on, and modifying the display to reflect the changes, microprocessors run really, really fast. I mean, real fast. If you have never seen a 680x0 running native machine code, you owe it to yourself to see it. It’s difficult to grasp just how fast such a processor can run when it’s not required to draw pictures at the same time. Trust me on this: unless you write the world’s largest and most complex application, you will not notice the overhead of a small inner interpreter!

A real-time interpreter

About a year ago I was faced with a new problem. I had developed one application of several that shared a single 80486 processor. My “app” got a 60Hz slice of time, in which it had to do everything it had to do, and leave enough time left over for the other apps, written by other folks, to have their turn as well. I had one 60Hz task, supplemented by some slower tasks which I ran by counting down from the base, 60Hz rate. Such an approach works fine if the work to be done at the slow rates is modest. However, it fails if the worst-case timing, going through both high and low frequency tasks, takes more than the allotted 16.7ms. Fortunately, my app didn’t do much at the slow rates, and my demands on the CPU were modest.

Then someone added a new requirement—one that would require diddling I/O port bits in specific orders, and also would require significant time to complete. To make matters more interesting, other folks were using those same ports, so I had to be careful how and when I accessed them. The only good news was, nobody much cared how long the process took.

I could have solved the problem by defining a new task—one that ran at a low priority—and let it grind in the background. However, for technical reasons too messy to go into, this was not feasible. It meant running a long computation in a task that had only 16.7ms allocated to it. I had a problem.

While pondering the situation, my thoughts wandered back to that case where we had implemented Forth on a Honeywell minicomputer. Though the mini had only one interrupt, we had made it look like it had 16 vectored interrupts, with separate priority levels. The trick was to note that, as long as we didn’t spend much time in any Forth primitive—a condition guaranteed by their small size—we could let the inner interpreter check the condition of the “interrupt” lines (actually a simple parallel port) once each pass through it. Our abstract machine had abstract hardware!

I realized that I could do something similar with my new problem, only in reverse. If I wrote an inner interpreter that executed in my 60Hz task (executing only one primitive per pass), I could do the work without adding appreciably more load to the CPU. In the remainder of this column, I’ll walk you through the development of the interpreter that I still call Sweet. The program was written entirely in C.

Introducing Sweet

In choosing the architecture of Sweet, I had no doubt I’d use a stack machine. However, I thought, I might need storage for arrays and things, so I included a random-access “RAM” as well. To keep life simple, I also used a separate stack for return addresses. The main data structures are shown in Listing 2.

Next, I needed to define my operations. To avoid the overhead of subroutine calls, I put the operators in line. I show only a couple of operators in Listing 3; extension to other operators is straightforward.

From the code fragment in Listing 3, you can see that I used a case statement to choose between the various opcodes. The general structure of the inner interpreter is shown in Listing 4.

Note that I’ve added a few tests for stack overflow and ip overflow. These are not really necessary, of course, but they help keep things under control.

If I were writing the interpreter in assembly language, I’d use a jump table here instead of a case statement. You might be tempted to implement a jump table in C, but you’d be wasting your time. Most compilers generate one, anyhow, if a case statement uses contiguous values for the various cases. Using a case statement inspires me to define the opcodes using an enumerated type, as shown in Listing 5.

You will note that every operator takes up very little space—many are one-liners. This means that unless we’re trying to squeeze the interpreter into the teeniest of microcontrollers, we needn’t worry too much about saving space. We can add as many operators as we might need. After all, we aren’t trying to build a minimal machine here (though that is always fun, too), but a system that will let us get the work done in as short a time as reasonable. Accordingly, I loaded Sweet up with every operator I thought I might need, including shifts and rotates, bitwise ANDs and ORs, and a huge array of logical tests, branches, and function calls—even higher-order constructs like for-loops.

Coming soon

We’ll delve into Sweet in more detail next month, including the higher-order constructs, of which I’m particularly proud. I’ll wrap up next month with a listing of the complete interpreter, less a few proprietary bits. I think you’ll like what you see.

I should mention one other thing, in case you’re wondering. Recall that I said that the interpreter should only execute one instruction per pass. Yet you see an infinite loop in Listing 4. That’s because the code of Listing 4 came from my test program, which ran as a standalone program. Next month, I’ll show you how easily we can turn this test program into the real McCoy. See you then. esp

Jack W. Crenshaw is an independent consultant and president of Crenshaw Associates in Orlando, FL. He did much early work in the space program and has developed numerous analysis and real-time programs. He holds a PhD in physics from Auburn University. Crenshaw enjoys contact and can be reached via e-mail at jcrens@earthlink.net.

Return to Embedded.com

Send comments to: Webmaster
All material on this site Copyright © 2000
CMP Media Inc. All rights reserved.

Embedded.com Career Center
Ready to take that job and shove it?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS




 :