CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS




Programmer's Toolbox

by Jack W. Crenshaw

A Personal Tour of Interpreters

For the benefit of those just joining the party, we’re two months into a series on the use of language interpreters in real-time systems. Because I chose to introduce the topic in a chronological way, and focus on my own experiences and initial exposures to small-scale (“personal”) computers, minicomputers, and microcomputers, the series has developed (some might say, degraded) into a collection of personal, and therefore decidely biased, reminiscences about the “good old days” when computers were new, and real programmers used machine language written in octal, sans such artificial crutches as assemblers. This approach struck a responsive chord with many of you ... er ... more “seasoned” readers, who walked down many of the same roads, and sometimes even the same company hallways, as I. I’ve gotten more e-mail in response to the last two issues than in all the previous year, most of it having to do with the personal recollections of other folks. It makes for some very interesting reading; I wish this magazine had room enough to hold all the stories, because they definitely do hold some lessons for us all. Someday, perhaps, we can round up our collective reminiscences into a comprehensive history of those early days, before they are forever lost in the dustbins of time.

On the other hand, I expect you younger whippersnappers must not have had the slightest idea what I was talking about, and must have been wondering when I was ever going to get to the point. Well, this month, as promised, I will.

Op codes and reverse Polish

As the unfolding story closed last month, our hero ... er, this writer ... had just been infected by the notion of interpreters as a way of getting a computer of limited capacity to act like a much larger and more capable machine. I had written a couple of hex debuggers which, like all debuggers of the time, and many good ones even today, depended on the use of single-character commands read from the keyboard. My epiphanic transformation came with two realizations:

• Commands need not come from the console; they could be drawn from RAM memory

• Commands need not be limited to those useful for debuggers. They can do whatever we desire them to, and can be tailored to the particular application

When you talk about drawing commands from sequential locations in memory, you’re talking about the central concept of the von Neumann machine architecture. You’re talking about a computer, not a debugger. This is the basis of the concept of the interpreter as an abstract machine.

One other part of the puzzle that fell into place at about the same time was my exposure to stack machines. For me, this came with advent of the exquisite, Hewlett-Packard hand-held calculator, the HP-35, which used a four-level stack and reverse-Polish notation. The term “Polish notation” comes from the name of its inventor, who happened to be a Pole named Lucasziewicz. You don’t have to try to say or spell this name very often to understand why the concept came to be known as Polish notation, instead of Lucasziewicz notation. Reverse Polish, also known as postfix notation, is called “reverse” because the notation originally proposed by Lucasziewicz had the operator placed before its operands, not after, as is commonly done today.

The advantage of Polish notation over ordinary arithmetic notation is that in Polish notation there is no need for parentheses or operator precedence. In Polish notation, every operator always operates on the one or two operands immediately following it. In reverse Polish, it operates on the operands immediately preceding the operator. In this later notation, you need some place to put the operands until you get to the operator that operates on them. A push-down stack is the perfect place for storing them temporarily, so the notion of a stack machine and reverse Polish notation go together like red beans and rice.

While we’re on the subject of Polish notation, this is as good a place as any to correct an erroneous recollection I shared a couple of months ago. I mentioned IBM’s first minicomputer, the IBM 1620, and its somewhat sad, not to mention sordid, role in the demise of the first generation of affordable, single-user computers, of the kind that would now be called personal computers. This came about because of the penchant of IBM’s salespersons to push the 1620 as a full-blown computer, like a mainframe, only slower, and running a batch operating system which included a Fortran compiler.

That part was all true; my recollections were accurate. The part I missed, however, was the statement that, at the university where I was studying for my PhD, we used a 1620 in just such a role. That part was untrue; I misremembered. In discussing the 1620 with some of my e-mail contacts, I realized that it wasn’t really the 1620 we used as the “slow mainframe” as I described it; it was the much more capable IBM 7044. My neuronic failure can be partially explained by the fact that we did, indeed, have a 1620 sitting in the same room as the 7044. The 1620 wasn’t used, however. Mostly it sat there collecting dust, because it simply wasn’t up to the task of acting like a mainframe, even a slow one, and with the 7044 sitting right beside it, nobody saw any particular need to step down to the 1620.

Nevertheless, the 1620 played a key role in my journey into interpretive languages. Precisely because it was almost never used, my officemate took an interest in it and decided to do a little tinkering. He arrived at the same conclusion others before him had reached: it wasn’t the computer itself that was such a dog; it was the batch programming system and the Fortran compiler with which IBM had saddled it. Originally, IBM had sold the Fortran compiler as a step up from interactive, interpretive languages. The battle cry was, “wouldn’t you rather be programming in Fortran?” That cry apparently sold a lot of 1620s, and it certainly aided in the demise of those first-generation “personal” computers, all of which used interpreters. However, it didn’t gain the 1620 much actual use, because the computer simply wasn’t up to the task of supporting a Fortran compiler.

My friend, however, realized that since almost no one was using the 1620 in our shop, he could have virtual ownership of the computer, if he chose to. The catch was, to get any usefulness out of it, he had to program it in machine or assembly language, and program it as though it were a single-user machine. He set out to do precisely that; he wrote a primitive operating system that gave him access to the various peripherals, and began studying the technology of language translators. His goal became a lofty and ambitious one, and was essentially the same as my dream for my old Radio Shack TRS-80: have no software anywhere in the computer that was not put there by yours truly.

Looking back, I don’t think Earl was any more successful with the 1620 than I was with the TRS-80. Writing an entire OS, with all its support tools, language translators, and so on, is a daunting and time-consuming task, and he did, after all, have a research project to do to collect his degree. Like me, he never found the time to complete his grandiose plan. Looking back, however, I now remember that it was Earl Cook who first exposed me to Polish notation, and taught me the algorithm he’d discovered for translating arithmetic expressions into Polish. The seed was sown.

An idea emerges

Fast forward about 10 years, and we’re back to 1978, where I had closed off the account last month. By this time I had my trusty TRS-80, and a fellow adjunct professor and I had agreed to try to write a high-order language compiler for the TRS-80. All through the spring and summer of that year, the ideas were gelling in my head of beginning with a simple core language, based on single-character “opcodes” for a stack-oriented abstract machine. My colleague had recognized the machine I was cooking up as the intermediate language for a compiler, so we had agreed that I would write the “inner interpreter,” and he would write the front end, language lexer/parser. By the time I left Huntsville, AL (and therefore my colleague and co-conspirator) to join Heathkit in the winter of ’78, I thought my plans were pretty thoroughly worked out. I was still hanging onto the dream of using single-character opcodes with strong mnemonic significance, such as ‘+’ and ‘-’. Readable characters, in other words, so that one could write in the “native language” of the abstract machine without having to memorize or look up opcodes. I was laboring mightily to come up with an acceptable set of such opcodes. The arithmetic operators were obvious, but some others weren’t. How, for example, do you distinguish between a bitwise and a logical and? Today, using C, we’d use & for one, && for the other, but I was trying to limit myself to single, readable characters, and having trouble deciding which character meant what.

Despite this problem, which was really rather a trivial one, I had what I thought were some pretty darned good ideas. I recognized, for example, that one didn’t need to encode every single operation as a primitive instruction in the abstract machine. If you had a good enough core set of operators, you could always create more as “macros” based on the primitives. I also realized, early on, that the interpreter to implement that core set would be quite small, and easily ported to other computers. To switch computers, you only needed to program the “inner interpreter”; all the higher-level, more abstract operations could be retained as macros of the primitives. I had pretty well worked out the minimal core set of operators that would have to be programmed in machine language and would therefore have to be rewritten.

I had gone so far as to anticipate the objections from some folks that micros were slow enough, without burdening them even further by forcing them to execute an interpreter in real time. My solution was to build, as a next step, a compiler code generator, that would spew out native machine code, essentially copying the macros of the inner interpreter into an object file, rather than executing the same instructions interpretively. I had even worked out a primitive keyhole optimizer, that would clean up the interfaces between macros and eliminate any obviously redundant loads and stores. I envisioned a whole series of compilers, allowing one to bootstrap the compiler to a new machine, first by porting a simple, minimal set of core instructions and interpreting the rest, and then using successively more sophisticated tools to refine the output of the compiler into much more highly optimized code.

I was so excited about my ideas that I presented them to the PTB (powers that be) at Heathkit. They were unimpressed. (I do recall, however, that only a few short months later, they were clamoring for UCSD Pascal).

The cosmic convergence

Until now, I’ve been careful to keep this discussion on a very personal level, relating only the things I’ve seen and done. I warned you in advance that it would be a very one-sided history of computing. If, however, I’ve given the impression that I thought I was the only one doing things, that I was the one who invented threaded code, keyhole optimizers, or any other such foolishness, let me correct that impression.

I did, I suppose, invent certain things in the sense that I thought of them independently, while being blissfully unaware of what other people were doing. Part of that lack of awareness, I chalk up to simple naïvete; the rest, to the fact that Hunstville, though it may be called the Rocket City, was not then exactly what you’d call a hotbed of leading-edge computer and software technology.

The fact is, if you get anything at all out of this rambling personal account, it should be the old adage, “great minds run in the same channels.” Even some not-so-great minds. I don’t claim to be the only one having all those neat ideas. Far from it; others had them sooner, and carried them further. In fact, sometimes it seems to me that I’ve been destined to always be the bridesmaid, never the bride. The following story might illustrate what I mean.

Some time in the early ’70s, a colleague and I decided that running Fortran simulations for the rest of our lives was not a guaranteed path to fame and fortune. We sought another path. Our line of reasoning went something like this:

• There is (was) an explosion of electronic gadgets hitting the market

• We should be able to use them to make money

• The most money to be had was in the consumer market

• The most ubiquitous consumer device was the TV set

• People had more leisure time, therefore could enjoy frivolous pastimes like games

Conclusion: we’d build an electronic gadget that one could attach to a TV set, and use it to play games. In short, we’d invent the world’s first TV game.

Personally, I was into simulation, both analog and digital, and had seen a lunar-landing simulator at NASA’s Space and Rocket Center, which used an image drawn on an oscilloscope screen. We asked ourselves if we could create a similar image on the raster-scan CRT of a television. Answer: yes, as long as the image wasn’t very complicated.

Next question: what do we simulate? We had tons of ideas, from race cars to lunar landers, but decided on the simplest thing we could think of: a bouncing ball. I tried to figure a way to draw a circular image on the TV screen, but was unable. It was easy, however, to draw a square one. We weren’t at all sure that the customers would go for a square ball, but it was the best we could think of.

Next question after that: what would the human players do? Again, we decided to keep it simple, and let them move a pot that controlled a rectangular paddle. We had invented (ta-daaahhh) a TV ping-pong game!

That afternoon, I rushed home to my family. I couldn’t wait to tell them the good news. I ran in the door, chattering excitedly that we had invented a new gadget that was going to make us all rich. My wife and kids asked me what it was. I said, “It’s a game that you attach to your TV. It lets you play ping-pong with a simulated ball.”

Immediately, my twelve-year-old son said “Oh. You mean like Pong?”

My colleague and I, it seemed, were a day late and a dollar short. In our haste to develop a new gadget, and in our excitement over our line of reasoning, we had somehow failed to notice that the thing we were inventing had already been invented, was already on the market, and familiar to all teenagers.

Well, you must admit, our line of reasoning was sound.

Such was, it seemed, the case with my ideas for an interpretive language, and that’s really the whole point of this chronologically oriented account. When the pieces of a puzzle come together, when all the needed parts are in place, no one person invents their use. That the parts will sooner or later get put together in a given arrangement is as certain as the sunrise; the only question is, who does it, and who gets the credit? In the case of language interpreters, it seemed that some kind of cosmic convergence was going on. Lots of people thought of the same thing at the same time, mainly because the gadgets needed to implement them, namely microcomputers, and the concepts to make the ideas work, like stack machines and Polish notation, were finally in place.

In my case, the key components were the notions of an abstract machine, a stack-based architecture, and a micro to run them on. I had decided, long ago, to build an interpreter based upon the idea of a single-character command for each operation. But I wanted the characters to be readable characters, so I would be able to program in the “machine language” of the abstract machine. Other folks, it would seem, weren’t nearly so fussy, and therefore got things working while I was still trying to decide what the character “~” should mean.

In Zurich, a young man named Nicklaus Wirth invented a neat little language. Wirth had worked on the Algol 60 committee, but found Algol to be too complex and baroque. He wanted something simpler, so he invented it. He called it Pascal.

But Wirth wasn’t content with simply inventing a language. He wanted to see it implemented—the sooner, the better. He wanted to see it running on as many different kinds of computers as possible. Recognizing that one of the problems of high-order languages is the nastiness of the code generator—that is, the part that spits out the native code instructions for the target computer—Wirth thought of this concept called an abstract machine. He envisioned an interpreter, running on the target machine and emulating an abstract machine. The code generator need only generate code for the abstract machine. A tiny interpreter would then execute this code as though it were the abstract machine. By an odd coincidence, Wirth’s abstract mach-ine was a stack-based machine.

Wirth called this interpreter a P-machine, and the code, P-code. The terms remain with us to this day. You’ll find P-code as close as Visual BASIC and Microsoft Word.

Shortly afterwards, a professor at UCSD learned about Pascal, and thought it would be a neat project for his graduate students to build their own P-machine. The professor’s name was Kenneth Bowles, and his students succeeded beyond his wildest dreams. As more and more grad students worked on the USCD system, it developed far beyond a mere compiler. Bowles and his students not only developed their own abstract machine and the interpreter to run it on, they used the Pascal language to develop an entire disk operating system, compiler, editor, and so on. The UCSD Pascal system was not just a compiler, it was a complete OS with all the trimmings. By the time I arrived at Heathkit in the winter of ’78 (still with visions of interpreters dancing in my head), the UCSD team was shipping copies of their system all over the world. They would sell anyone a copy of their system for barely more than the price of a mag tape to hold it plus something to cover expenses. They also had a team of programmers happily porting the interpreter to new CPUs. Old-time trivia buffs will recall that when the IBM PC came out in 1981, one of the operating system options offered for it was UCSD Pascal.

The beauty, and the Achilles heel, of UCSD Pascal was its inner interpreter. It worked precisely like the one I had in mind, with one exception: the single-byte instructions were not readable characters. Since they weren’t, UCSD had 256 opcodes to play with, where I was limited to 50 or so. The general architecture of the interpreter is not exactly rocket science:

loop

get opcode

execute opcode

end loop

You can embellish the code a bit if you like, and make it more specific, by introducing the concept of an instruction pointer, an opcode, and a jump table:

byte *IP;

byte opcode;

while(1){

opcode = *IP++;

jump_table[opcode]( );

}

This tiny core doesn’t see any other parts of the system; if there’s an operand stack, for example, or a return stack, those data items are processed by the functions called by the inner interpreter. It’s simplicity itself. And I thought of it. Like Pong, I just didn’t think of it first !

As you can see from the above code fragment, it’s perfectly possible to write the “inner interpreter” in a high-order language. However, since the efficiency of the entire system depends on it, the interpreter is almost always written in assembly language. For computers that support indirect, indexed addressing, such as the PDP-nn (then) or the 68xxx (now), you can almost write the inner interpreter in one line of assembly language.

Sailing Forth

As you might guess, with such an explosion of ideas flying around, Nicklaus Wirth, Kenneth Bowles, and even I and probably thousands of other folks were not the only ones having them. Some people even followed up their ideas with usable systems.

At the National Radio Astronomy Laboratories, a fellow named Chuck Moore needed to program a radio telescope to follow stars. Instead of simply writing the code in assembly language, he, too, chose to write an interpreter and then program the commands to the telescope in the language of the interpreter. However, Chuck’s approach had some significant departures from the single-byte idea, and that has made a world of difference. Perhaps you’ve heard of his creation. It’s called Forth.

Two key characteristics distinguish Forth from other similar, interpreted languages. First, it uses what’s called “indirect threaded code.” We’ll talk more about that later, but for now, think of it simply as one more layer of indirection in the inner interpreter:

word *IP;

word opcode;

while(1){

opcode = *IP++;

jump_table[dictionary[opcode]]( );

}

The opcode is no longer simply an index into a jump table, but an index into what the Forth people like to call a “dictionary,” which in turn contains a jump table.

Now, at this point, I expect to get flames from at least two sets of people: first, the Forthians who scream that I’m grossly misrepresenting the power and the subtleties of Forth, and the other from non-Forthians who wish I’d never uttered the word. Please don’t flame me. I realize that the above explanation is a gross over-simplification of the Forth inner interpreter, and it says nothing at all about the outer one. However, I stand on my central point, which is that the Forth interpreter adds one more layer of indirection. I’ll discuss other aspects in a moment.

The second major departure from the single-byte interpreter was that Moore realized, long before I did, that he wasn’t going to be able to settle for single-character function “names.” Though he wanted, as I did, to be able to read and write the language of the interpreter without a list of opcodes handy, he realized that he’d need multicharacter names. That, in turn, implies a mechanism to read the names and branch to the appropriate code. Hence the dictionary and the 16-bit “opcode,” which was really an address. He could have chosen to search the dictionary each time a name was encountered, but that would slow the process down considerably. Instead, he built a process called a “compiler” that runs as commands are entered. This “compiler” searches out the command words and replaced them with the corresponding entry addresses. Writers of BASIC interpreters would call this process “tokenizing,” but that’s not what the Forth folks call it.

The most profound effect of Moore’s dictionary approach was that dictionaries can be extended. This opened the possibility, for the first time, of programmers creating their own entries, assigning them a unique name, and using them as though they were part of the language. In short, Forth was one of the first, and is still one of the small few, extensible languages. This feature has turned out to be a mixed blessing. Yes, it allows the Forth programmer to extend the language. It also encourages the use of short, easily tested functions, a key aspect of structured programming. The folks who program regularly in Forth see it as one of its great strengths. It’s also a weakness in that, if one hasn’t first been taught the meaning of each of the previously defined functions, the program is impossible to read. Combine that with the Forth idiom of using short words, like */ or >R, and you can see why Forth has earned the reputation of being a “write-only” language.

Chuck Moore’s development of Forth has become the stuff of legend, to the point where it’s no longer quite clear just what his original version of Forth really looked like. How close was it, exactly, to what’s now called Forth? I’ve heard through the grapevine that the original version didn’t have the second level of indirection, or the “outer interpreter” which gives the current language so much of its character. This wouldn’t surprise me any. Most of us develop software by trying simple things, and adding complexity only grudgingly, and only when we see a need for it. It would certainly not diminish Chuck Moore, in my eyes, to learn that he followed this same process; quite the contrary, I would admire him all the more. However, at this point I think only Chuck can tell us what really happened in those early days.

My first exposure to Forth was around 1979, shortly after my (somewhat abrupt) move from Heathkit to Honeywell. You can see why its stack-based architecture and readable commands would have resonated strongly with my own ideas. At first glance, it seemed to be my dream interpreter, only more so. I arranged to have a team of folks come to Honeywell to give us a lecture and demo. The team included Forth Inc.’s future CEO, the redoubtable Elizabeth “Liz” Rather. They made the language sound great, but I must admit, it was difficult to tell just exactly what they were saying. I recall specifically asking Rather “What exactly is Forth? Is it an interpreter, a programming language, an assembler, an operating system, or what?”

Her answer: “Yes.” I was not enlightened.

In my personal opinion, life for Forth and its proponents was made considerably more difficult by their almost pathological insistence on using unfamiliar names, or familiar names in unfamiliar ways. For example, a subroutine isn’t a subroutine or function, it’s a word. It’s not a program, but a sentence. It’s not a system, but a dictionary.

It almost reminds me of the Apple Macintosh (and now Windows), where directories became folders, files became documents, and programs became applications, and you don’t delete a file, you drag its icon to the trash can. Bah! It seems to me that our software business is difficult enough without folks going around and changing names on us. I know they mean well. They may even believe that they’re making things easier, and perhaps they are, for someone who has never seen a computer before. But for those of us who have, and think we know what the term “subroutine” means, renaming it only spreads confusion.

Despite our confusion, the promise of Forth seemed real. The Forth vendors and users both claimed productivity increases as high as 10 to one, not to mention increased happiness on the part of the programmers. As it happened, we had a Forth user of national stature, Dr. Harvey Glass, across the bay in Tampa. Dr. Glass had been teaching his students Forth for a couple of years, and become radically sold on it. He had recently landed a consulting contract to help build a major software system based on Forth. The programmers seemed ecstatic.

Harvey invited us over to the University of South Florida to see a demo of Forth in action. I must admit, it was impressive. One of his students began with a very simple function (word definition, in Forth terms), that put a one bit out on one line of a parallel port. He connected up a (DC) amplifier and loudspeaker to the port, so we could see the cone pulse out when the program was run. Next, he wrote a similar function that reset the bit back to zero. Then he wrote a function that pulsed the cone, waited a while, and removed the pulse again. The time delay was pulled off the stack, and it was easy to see that we could alter the timing simply by pushing a different constant.

I expect you can guess what happened next; soon we had a square-wave generator, making a tone of variable frequency, and shortly after that, he had the darned thing playing “Daisy.” Except for the table of notes and delays, which defined what frequencies to play and for how long, the fellow had written and tested the entire program before our eyes, in less than half an hour. I was certainly impressed.

I suppose it should go without saying that this ability to test things as you go is a key feature of most interpretive languages. It’s the reason BASIC still has so many adherents. It was this feature that made Forth seem so appealing for real-time applications. As long, that is, as the time is not too real. After all, that inner interpreter does take some time to execute, no matter how tightly we code it. Executing a program in native machine language is always going to be faster than emulating an abstract machine, as programmers of the Power PC have discovered.

I should also point out that this “instant turnaround” doesn’t automatically follow from the concept of an inner interpreter. It didn’t for UCSD Pascal, and it also would not have for the interpreter I envisioned. Forth’s rapid turnaround was strictly courtesy of its “outer interpreter,” which, like the command processor of a BASIC interpreter, provided the user interface. It’s that interface, not the innards, that determine the degree of interactiveness, or lack thereof, in the final system.

Before leaving the topic of Forth, I will mention that we did, indeed, develop a project using it. The project was tailor-made for the language; in fact, it was very close to Moore’s original project, even down to the Honeywell 316 mini that we used (word has it, that’s the same type of computer Moore originally used). The program had to control something similar to a radio telescope; in our case, it was a platform for testing navigation systems.

We thought the $10,000 (in 1980 dollars) that Forth wanted for a system was a bit dear. Fortunately, Dr. Glass introduced us to FIGForth, the Forth user’s group, who sold us their public domain copy at considerably lower cost. We had to port the inner interpreter (Moore’s code for the 316 being slightly unavailable). I did the port, while our single other programmer began building the dictionary.

I won’t say that the project was without problems; the programmer I worked with was very capable, but not exactly our fastest worker, and the whole concept of Forth seemed very foreign to her. She didn’t like the idea of just copying in the FIGForth dictionary verbatim; she insisted on entering every word in the dictionary, and testing them, one by one. Building that Forth dictionary seemed to take forever, and our customer was chomping at the bit. To her credit, however, by the time the programmer had the dictionary built, she seemed to have gotten into the paradigm of Forth, and she wrote the actual project code in record time. The program worked flawlessly, almost from the get-go.

In the process of building that system, we learned one important lesson. The Honeywell 316 had only one interrupt line, and we needed more. No problem; we created our own interrupt structure using a parallel port.

Think about it: what exactly is an interrupt, anyway? To us programmers, it’s a line that snatches control away from the executing program, and gives it off to some deserving interrupt handler. We think of an interrupt as the ultimate in asynchronous events.

To the chip designer, however, it’s not asynchronous at all. He simply assigns times in the CPU’s instruction execution cycle in which to go out and sample the interrupt line. If the line is asserted, he arranges for the interrupt process to occur.

In the process of building our system, we realized that we could do in software, the same thing a CPU does in hardware. We added code to the Forth inner interpreter that, once per instruction, would go check the status of the pins on a parallel port. If we found one high, we executed a Forth “subroutine call” to a word that vectored and processed the interrupt.

In short, we took a machine with only one interrupt line and made it act like it had 16 interrupts, each with its own priority level. The catch was that the latency was slower, since it depended upon the cycle time of the Forth inner interpreter, rather than the CPU hardware. Even so, it was a really neat extension to the performance of the computer, it was absolutely necessary to get the computer to do what we wanted, and it worked like a charm. The message is this: the term “abstract machine” need not be limited to the way an instruction set is executed. It can also extend to change the way the computer interacts with the outside world. Neat.

Threaded code—a taxonomy

All of the interpreters and similar software artifices can be lumped under the generic title threaded code . To those of us who have been exposed to (or subjected to) Forth, the term has come to be identified with that language/assembler/compiler/operating system. However, I hope you have seen by now that this is not the case; the term has a much broader meaning. Conversely, a threaded code concept can be implemented in a lot of different ways.

The first way (Example 1) is also the easiest; simply write the program as a series of calls:

CALL A

CALL B

CALL C

etc. (1)

This is exactly what I discovered when I examined the code generated by the IBM 1130 Fortran compiler. It turns out that the code generator of a compiler is a very large part of the total program, and also burns a major portion of the CPU time. This is partly because the native instruction set is usually so irregular, which requires every language construct to need its own list of instructions to execute. The code generator’s structure must necessarily follow the irregularities of the underlying instruction set, and therefore it’s big, messy, and slow. Another reason the code generator is large and slow, is because of the public’s demand for more and more efficient object code. Compiler writers must build in, even without optimization enabled, at least enough complexity to keep the compiler from generating the more blatant inefficiencies. Add any attempt at optimization, and the size and complexity of the code generator grow exponentially.

On the other hand, if a code generator doesn’t have to make any decisions, it can be made small and fast. This is one of the tricks IBM used to build a compiler capable of running in 8K of RAM. Since the code generator was always going to issue one call for every language primitive, its job was very simple; hardly more than a one-for-one translation from intermediate language to procedure call.

I also noted, in my discussion of the IBM 1130, the other key characteristic of all threaded-code implementations: the object code is small. You may recall my saying that I was so surprised at the size of the 1130 object code, I thought the compiler must have been broken. Threaded code was the reason.

Of course, we can always write code in any ordinary language, like C, in a fashion that is mostly function calls. To do so, in fact, might be considered the very epitome of structured programming, though very few C programmers practice it. Most would tell you that it’s too slow. And they’d be right.

We see, then, that there’s a major trade-off between ordinary programming and threaded code, and it’s a very common one: speed vs. program size. Threaded code, of whatever kind, produces object code that is smaller, but slower.

The size of the threaded code depends on both the computer it’s implemented on, and the way it’s implemented. I know that most folks are using 32-bit computers these days, but for simplicity’s sake, let’s assume a byte-oriented machine, with 16-bit addresses, like the 8080, Z80, or many microcontrollers of modern times. For such a machine, each Call instruction in Example 1 takes three bytes, so a program of N instructions will obviously take 3 N bytes (I’m deliberately ignoring issues of data formats, since their size can be so implementation-dependent. For the record, the IBM compiler apparently stored all literals in a separate block; they never appeared in line).

You will note that the code of Example 1 needs no interpreter. It’s pure target machine object code, which just happens to have a very regular pattern to it. As such, it can run as a true executable program, no interpreter needed.

It is, however, a little bit redundant. If every instruction is truly a call instruction, the byte that encodes the call is totally unecessary. We can reduce the program size even further, from 3 N to 2 N , by writing the simplest of interpreters:

word *IP;

while(1){

call(*IP++);

} (2)

In short, the “program” becomes, simply, a list of addresses. The interpreter fetches each address sequentially, and calls the subroutine at that address.

Finally, if the number of subroutines is less than or equal to 256, we can encode each subroutine as a byte. Now we have a program that’s only N bytes long, at the expense of a more complicated (and slower) interpreter. This single-byte approach is the one used both by my debuggers and my dreamed-of interpreter, and the UCSD Pascal interpreter. As might be imagined, the generated object code is very, very small, but the interpreter runs slower.

All three types of organizations I’ve just mentioned are classified as “direct threaded code,” because each “entry” in the program points, more or less directly, to the function to be executed. The organization of Forth adds one more level of indirection (and therefore is slower), but gains in flexibility.

What does it gain? Well, for one thing, it’s possible to define multiple data types, and associate functions to go with them. Every Forth data item is referenced, not directly, but through its dictionary address. This means that one can have different functions (words) to operate on, say, floating-point data rather than integer data. The functions are stored in the dictionary along with the data type description, not the individual data item.

Did you notice that this means that Forth stores methods along with its data items? Because of this organization, Forthians argue that Forth is the original object-oriented language. They do have a point, but I consider it a stretch. In any case, they argue that, while the indirect threading may be a bit slower for executing primitive instructions, it saves both space and time when referencing data items.

As usual when discussing Forth, however, one must be careful—both the terminology used and the system itself tend to put you on slippery ground. For example, there are quite a number of other Forth features (for example, a separate return stack, and the instructions to access it, or the stack-oriented words, like swap, over, and so on) that are not related to the concept of indirect threaded code. However, the term has been applied so often to Forth that the other Forth features tend to get attached to it, fairly or not.

The fact is, Forth tends not to stand still. There are quite a number of implementations now that don’t use indirect threading at all or a separate return stack. In fact, some are true compilers, generating true native code. If there’s a message here, it’s not to allow your prejudice, be it for or against Forth, to color your understanding of either direct or indirect threaded code. There is really no connection between the two, other than the fact that the original Forth used indirect threading.

Still to come

There’s still one more interpreter of note that I haven’t mentioned. It was developed by a rather clever fellow from California named Steve Wozniak. In my own estimation, Steve’s ideas were the cleverest of all and the most applicable to embedded systems. We’ll talk about his ideas next month. I’ll also wrap the discussion up by giving you my own version of Steve’s brainchild, written in C. See you then. esp

Jack W. Crenshaw is a staff scientist at Invivo Research in Orlando, FL. He did much early work in the space program and has developed numerous analysis and real-time programs. He holds a PhD in physics from Auburn University. Crenshaw enjoys contact and can be reached via e-mail at jcrens@earthlink.net.

Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :