Instruction set translation -

Instruction set translation


If instruction sets didn't matter, processors would be cheaper and designers would have more options. That's why one startup's efforts are so intriguing.

Every microprocessor is different, in part because it executes its own special set of instructions. Like human languages, CPU instruction sets have a lot of similarities; they just use different words to convey the same basic concepts. But what if there were a kind of electronic Esperanto, a universal microprocessor language that enabled any chip to run any software?

Java and other virtual machines try to deliver half of that promise. They allow one type of code (Java bytecodes, in this case) to run on any processor, as long as that processor has a virtual machine translator. But what if you could run any code—x86 binaries, 68k code, ARM executables, and so forth—on any other processor?

The idea is attractive. Processors could become generic and interchangeable. You could switch processors without changing your code. You could create instant software ports; it wouldn't matter what CPU you compiled your code for. Software would no longer be tied to hardware; any platform could run any other platform's programs. Imagine the turmoil such a magical tool would create in the processor business or the software business.

College try
One company polishing up this software Philosopher's Stone is Transitive Technologies in Manchester, England. Transitive has a software-translation program called Dynamite that, theoretically at least, can convert binary object code written for one processor into code for any other processor. Dynamite performs this software sleight-of-hand in real time as programs run. It works more or less like a Java virtual machine, except that programs don't have to be compiled into bytecodes. Dynamite takes existing code for one processor and runs it on another.

Dynamite runs either on top of or under the operating system, fetching and translating instructions on the fly. It reads “source” instructions (say, x86) in small batches and converts them to equivalent “target” instructions (MIPS, for example) for the actual processor it's running on.

Rather than translate instructions one-by-one, Dynamite looks for common “phrases,” such as loop set-up or three-operand arithmetic operations. It translates these into its own internal format (essentially Dynamite's answer to bytecode) before emitting the native instructions. The translation isn't permanent; Dynamite never stores the translated code, and so programs have to be translated afresh every time. It does cache portions of code at run-time, however, so loops or common subroutines can be reused without being retranslated.

Software translation is eerie when it works. It's curiously unimpressive to the casual observer; the more you understand about the awful complexity of binary translation, the more impressive this unimpressive display becomes.

The road to perdition
Binary software translation has been around for decades, and sometimes it even works. But that road is paved with failures and near misses. The devil lurks in the details of software translation and success has been determined by how well you control the details. The few isolated success stories only serve to illustrate the difficulties; the exceptions prove the rule.

Apple Computer, for one, was very successful in shifting Macintosh from 68000 to PowerPC processors. Customers barely noticed a thing. Digital Equipment Corporation (DEC) also had translation code for its ill-fated Alpha processor that could convert PC programs on the fly. IBM, Burroughs, and others have translated mainframe software for decades. Even Transmeta's Crusoe chips do binary translation on the fly and you could argue that AMD and Intel processors do the same.

But Apple, DEC, and Transmeta all had help. They're translating from one very specific and well-defined computer to another specific computer. In Apple's case, its translator only had to convert code from one Macintosh to another Macintosh. Apple controlled both sides of the equation. The translator didn't have to convert all 68k code, nor did it have to create generic or portable PowerPC code, only something that would work on a Power Mac.

The Mac isn't even a particularly difficult machine to emulate. Macintosh programs, by their nature, use a lot of operating system calls and “toolbox” routines built into every Macintosh (some of it in ROM). Those routines didn't have to be translated, only reimplemented. The more time an application spends in the toolbox or MacOS, the easier it is to convert.

Digital's FX!32 program benefited from a similar set of training wheels. It converted PC applications to run on DEC workstations using the Alpha RISC processor. PCs are pretty well understood, and DEC, of course, knew its own workstations inside and out. Given a little time, FX!32 could convert PC code to Alpha code and turn a DEC workstation into a reasonable facsimile of a generic PC. Even so, FX!32 didn't rescue DEC or Alpha.

What DEC and Apple both had in their favor was their ability to stack the deck. They controlled one (DEC) or both (Apple) of the computers and processors involved. Neither made any attempt to be generic; these tools were honed for one specific job. Apple never tried to translate all 68k software, and Digital didn't take on all x86 code. Transitive, on the other hand, is trying to do just that.

Although Apple's Franken-Mac experiment went pretty well, it didn't work all the time. Some older Macintosh programs simply didn't work on the newer Power Macs. It was a small percentage, but those few programs drove a stake in the heart of enthusiasts who said binary translation was perfect.

A few recalcitrant Mac programs might not be a big deal but embedded systems need to be more reliable; it's not acceptable to have occasional failures. Embedded systems might expose obscure corner cases where the translation isn't quite perfect and the system crashes as a consequence. If you're selling network boxes by the tens of thousands, binary translation is both tantalizing and terrifying.

Problems and shortcomings
Binary translation's problems may be more cultural than technical. Embedded programmers who've dabbled in the black arts of binary software translation bear the scars of it, and they're only too aware of the pitfalls that lie hidden and buried. It's relatively easy to get 80% or even 95% accuracy; it's that last 5% that'll kill you.

It's also quite hard to prove that a translator works all the time, every time. You can't prove there are no giraffes in San Francisco, you can only prove that there are. Until or unless the translator fails and provides a negative example, programmers must take its reliability on faith.

Even with perfect fidelity, binary translation would wreak havoc with real-time systems. Transitive's Dynamite takes longer to convert some code phrases than others. It also learns over time, so translation times will change (for the better) as the program runs. There's no feasible way to predict the response time or interrupt latency of such a system. The same problem afflicts Transmeta's Crusoe chips, which is why that company continually argues over PC benchmarks.

A perfect CPU translator is still just that: a CPU translator. It won't emulate I/O devices or peripherals that aren't there. An instruction-set translator won't turn your embedded system into an X-Box, X-ray scanner, or XScale-based PDA.

Weirdly, binary translation is probably illegal in some cases. Most shrink-wrapped software includes a license agreement (tacitly agreed upon when you remove the plastic wrapping) that prohibits disassembling or reverse-engineering the software. A litigious software company could argue that translation constitutes disassembly. This may not be a big deal for embedded systems that have no retail or third-party software.

What if?
Software translation would be the classic “disruptive technology” if it catches on. If processors become interchangeable, embedded designers could choose their chips based on price, power, performance, or color for all it matters. Generic CPU chips could radically upset the market for embedded processors, causing prices to fall and competition to increase. Instruction-set tyranny and backward compatibility could be long gone. Compilers—and compiled code—would no longer be tied to processors.

Certainly binary translation works to some extent. Transitive shows off shrink-wrapped PC applications running on MIPS processors with no apparent drama. Without checking the part numbers, you'd never suspect anything was wrong (or at least, unusual). But is translation comprehensive enough, reliable enough, and trustworthy enough for embedded systems? Apple's or Transmeta's success might not, er, translate into the embedded world. esp

Jim Turley is an independent analyst, columnist, and speaker specializing in microprocessors and semiconductor intellectual property. He was past editor of Microprocessor Report and Embedded Processor Watch. For a good time, write to .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.