Do processors and instruction sets matter? - Embedded.com

Do processors and instruction sets matter?

There's a sense that evaluating processor chips based on their instruction sets is a waste of time; most programmers don't know or care about the differences. RISC, CISC, VLIW… it's all the same. Is that an accurate reflection of reality? Have high-level languages made CPU chips generic and interchangeable? Or do underlying hardware details make a difference and we're just too busy to care? Our group of embedded experts ponder these issues – – which affect the decisions embedded programmers face on a regular basis – – in this latest edition of Shop Talk.

Niall Murphy
My take is that programmers care a lot less about the instruction set. One obvious reason is that the compiler hides a lot of things. For very small processors that can't handle a stack very well, some of the ugliness shows through in the compiler, but most of the time programmers do not see these effects.

The second reason people care less is that most instruction set design is design for speed. Some processors are nicer to program than others, but that is generally a side effect. These days, speed of the CPU matters only to a small percentage of embedded products. Most of the systems I work on control something mechanical, and the mechanical system is so slow compared with the processor that extra speed simply would not be of any use to me, and if I did need it then it is available for small money. I reckon some of the networking people still care about raw speed, and maybe some graphics people (though high end graphics is usually desktop peripheral market rather than true embedded stuff).

In the PC world, when Intel introduces something like the MMX instructions, compiler and application vendors need to decide if they want to take advantage of them, but I cannot think of an equivalent, from my experience, in the embedded world.

I do not do any DSP work, but I am guessing that that may be an area where the instruction set is a real differentiating factor.

Jack Ganssle
My take is that instruction sets do matter, but not for technical reasons.

Instruction sets are important only in so far as they yield standard, well-known processors. The 8051 and PIC, for instance, are even today wildly successful parts, despite their brain-dead architectures. Why? Lots and lots of tools, and lots and lots of developers who are familiar with the parts. Plenty of trust in their viability.

Any ivory-tower pundit would laugh off these parts due to their silly instruction sets. Neither are particularly good C platforms. Performance is pathetic compared to the latest 100 Gigaweeniehertz 64-bitter. But they're adequate for many tasks.

Old architectures never die. The 68k is getting near the quarter century mark, yet is still healthy despite Mot's attempts to kill it off with the ColdFire and PowerPC. And why not? The 68k offers great performance, lots of variants, and a decent price. Max clock rates are laughable compared to modern alternatives, but they are good enough for most work.

The instruction set matters in so far as it's a standard architecture, one that developers know and love, one that's stable and likely to be around for a long, long time.

Look at it a different way. Suppose a vendor offers the most amazing CPU ever, one with an instruction set so beautiful it induces tears of joy in developer's eyes. And it costs zero dollars in any volume. I bet most of us couldn't afford to use the part! We'd need:

  • New tools: Being new. either they wouldn't work well, or we wouldn't believe they'd work well.
  • People familiar with the part: It's new, so there aren't any. Figure on costs going up to train folks.
  • Trust: Does it work? Will the pricing strategy change? Will the vendor be around in two years?

Bill Gatliff
CPUs and instruction sets matter!

Case in point: ARM and MIPS instructions are all 32 bits. In the Hitachi SH world, all instructions are 16 bits. That means that I can fit almost twice as many SH instructions into memory and cache as ARM or MIPS instructions, and can stuff them across a memory bus at twice the speed.

A processor that isn't starving for instructions will use less power; a compact instruction set means less/slower/cheaper memory. That's an advantage for PDAs and other cost-sensitve, portable applications. (But don't get me wrong, I like all three instruction sets and all three CPUs. CPU selection is about more than just instruction sets.)

Unfortunately, programmers pay more attention to marketing hype than instruction sets and CPUs. That's why we get PDAs with StrongARMs instead of SHs.

And as for survivability, look at what happened to StrongARM design wins after Microsoft “refocused” CE on that target exclusively. Now imagine what would have happened if they hadn't. Predicting who will be here two years from now gets tougher all the time.

Jim Turley
There are more than 115 different 32-bit embedded processors, from more than two-dozen “families” of instruction-set architectures, available right now. Yet how many engineers or programmers know or care? And how many CPUs do they really evaluate for a new project?

Instruction sets do matter. They make a difference to performance, power consumption, price, tool availability, and a bunch of other vectors. Yet I think all of that gets lost in the realities of a working engineer's job. We tend to pick the chips we're familiar with, not the best one for the job. We're driven by inertia, laziness, or a mistaken belief that CPUs are all interchangeable.

Here's just one example. Multiplying two 32-bit numbers takes two cycles on Hitachi's SH7604; 43 cycles on a Motorola 68030, and anywhere from one to 15 cycles on an ARM7. But how would a programmer know that? The statement a=b*c looks the same in any chip's C code. (ARM7 multipliers are data-dependant; the speed of the result depends on the magnitude of the operands. Try working that into your hard real-time system!)

I agree with Jack's observations. Instruction sets are primarily a compatibility issue. We choose chips from a compatible family because it's easy. They give us access to popular tools and middleware. But that ignores the inherent “goodness” or applicability of a CPU to a given task. MIPS processors suck at graphics, but they're often used in graphics systems like video games. Why?

Assembly programmers know some instruction sets inside and out. But most programmers today use high-level languages that hide the underlying instruction set, rendering all CPUs generic. That doesn't fix the problem: some CPUs are still far better at some tasks than others. It just hides the shortcomings (and the strengths) from the programmer, who is now blissfully ignorant of how badly (or wonderfully) the chip is performing.

So although CPUs do make a difference, it seems like most engineers don't have the luxury of time to make the comparisons. We're all driven by deadlines, and sometimes the easy choice is the right one, even if it's for the “wrong” reasons.

Bill Gatliff
Jim Turley wrote:

MIPS processors suck at graphics, but they're often used in graphics systems like video games. Why?

Any ideas on the answer? I'd like to hear them. I'm guessing that it's because they're cheap, and more willing to release IP for ASICs and the like. But I don't really know.

Jim Turley
I'll argue with Bill a little on this one:

Case in point: ARM and MIPS instructions are all 32 bits. In the Hitachi SH world, all instructions are 16 bits. That means that I can fit almost twice as many SH instructions into memory and cache as ARM or MIPS instructions, and can stuff them across a memory bus at twice the speed.

Yes, you can fit twice as many 16-bit ops into memory compared to 32-bit ops, but you need more of them to get your work done. Probably.

If short instruction words could shave 50% off of code density, we'd all be using 1-bit architectures! Short instructions pay a penalty in less capability and less flexibility per instruction, so you generally need more instructions to do the same work. The overall savings is about 20%, in my experience.

Remember, cutting the instruction word in half doesn't cut the number of instructions in half. It cuts them by a factor of 65,535. That's a lot of trimming. In the case of SH, you can't do long/far jumps, you have only eight registers, and there are few arithmetic instructions at all. Obviously, many systems make do with these limitations, but they're not trivial and can't always be ignored.

Unfortunately, programmers pay more attention to marketing hype than instruction sets and CPUs.

That's very true. We now work in a world where microprocessors are marketed like perfume.

That's why we get PDAs with StrongARMs instead of SHs.

Hmmmm, possibly. There are differences between the over-hyped StrongARM (or XScale) and the under-appreciated SuperH, but you're probably right. It's mostly marketing now. (sigh).

MIPS processors suck at graphics, but they're often used in graphics systems like video games. Why? I'm guessing that it's because they're cheap, and more willing to release IP for ASICs and the like. But I don't really know.

Right you are. At the time Nintendo was designing the N64, there were few 32-bit processors available as licensed IP cores. SPARC, ARM, and MIPS were pretty much your only choices. SPARC performance was (and still is) lousy, and ARM was too wimpy. MIPS had a reputation for driving Silicon Graphics workstations, however, so the “halo effect” made MIPS the clear choice.

Bear in mind the MIPS processors inside the N64, PlayStation, and PS2 do very little of the actual graphics work. That's all done by a humungous custom graphics processor (called Emotion Engine in the case of PS2). The central MIPS processor just runs the OS and handles game logic. All the heavy lifting is done elsewhere.

Jim Turley
On the subject of the hidden costs of new instruction sets, Jack wrote:

We'd need:

  • New tools: Being new, either they wouldn't work well, or we wouldn't believe they'd work well.
  • People familiar with the part: It's new, so there aren't any. Figure on costs going up to train folks.
  • Trust: Does it work? Will the pricing strategy change? Will the vendor be around in two years?

Well, yes and no. GCC supports practically every new processor. It's free and worth every penny. Lots of working engineers don't like gcc, though. They prefer (or demand) commercial, professional, third-party tools.

The existence of third-party tools is sometimes used as a proxy for market acceptance. “Gee, if Green Hills makes a compiler for it, it must be okay.”

This relates to points 2 and 3: trust. So much of CPU choice is based on gut feel and emotion, not technical criteria. Ironically, in this fast-paced industry, most engineers don't like to stick their necks out or take chances. Nobody wants to flame out in front of their peers for choosing an oddball processor that's untried, unproven, or difficult to work with. That's why old families like 68k and x86 still sell so well, even though they're faaaarrrr behind the times. “You never lost your job for choosing IBM.”

Predicting who will be here two years from now gets tougher all the time.

Boy, that's for sure. If technical criteria were all that mattered, we'd be switching processors every month. But installed base, familiarity, and marketing have a big effect, too. There's no telling how those factors will play out.

Jim Turley
Niall makes some good points.

These days speed of the CPU matters to a small percentage of embedded products. Most of the systems I work on control something mechanical, and the mechanical system is so slow compared with the processor that extra speed simply would not be of any use to me, and if I did need it then it is available for small money.

Code density is another important consideration. The same C program compiled for ARM will be twice as big as the 68k binary. That's not something you can change in your software, or with compiler switches; it's a fixed characteristic of the processor. Some CPU instruction sets just naturally have better code density than others.

In the PC world, when Intel introduces something like the MMX instructions, compiler and application vendors need to decide if they want to take advantage of them, but I can not think of an equivalent, from my experience, in the embedded world.

That's true for high-level languages because C, et al, don't take advantage of unique CPU features. But if you're an assembly programmer, you can see big differences from one CPU to another.

For example, Motorola's 68300 chips have a cool TBLS (table lookup and interpolate) instruction that can replace hundreds of lines of C code. And it executes in six clock cycles. With one assembly instruction. C compliers never use it.

Hitachi's SH7708 processor has a 3D matrix transform instruction that does a whopping 16 multiplies and 12 adds at once, all with floating-point numbers. It's used for calculating the angle of reflection between a 3D light source and a 3D polygon. Obviously useful for video games. But only assembly programmers ever use it.

Jack Ganssle
I'd sure like to see a business person's take on the questions we've been asking. We're talking a lot about tech issues, but processor selection is a business decision. Code density matters only, well, if it matters. The best technology is not always the best decision. (Did I hear someone mutter “Windows”?) I agree with Jim that gcc, though nice, does not always assuage the boss, who may have a lot of trouble understanding the gnu business model.

And the biz model may fluctuate. This EEtimes article (“IC execs whisper of recovery”), suggests that Intel was planning to drive flash prices up a lot January 1st, 2003. So a lower code density that worked once may be less viable now, but who's gonna change CPUs once a company is in bed with a particular part?

Bill Gatliff
Jim Turley wrote:

Bear in mind the MIPS processors inside the N64, PlayStation, and PS2 do very little of the actual graphics work. That's all done by a humungous custom graphics processor (called Emotion Engine in the case of PS2). The central MIPS processor just runs the OS and handles game logic. All the heavy lifting is done elsewhere.

Considering that the GameBoy Advance runs an ARM7TDMI, clearly in some systems the “heavy lifting” is all in the user's mind…

Jim also pointed out:

This relates to points 2 and 3: trust. So much of CPU choice is based on gut feel and emotion, not technical criteria. Ironically, in this fast-paced industry, most engineers don't like to stick their necks out or take chances. Nobody wants to flame out in front of their peers for choosing an oddball processor that's untried, unproven, or difficult to work with. That's why old families like 68k and x86 still sell so well, even though they're faaaarrrr behind the times. “You never lost your job for choosing IBM.”

Trouble is, there are lies, damn lies, and benchmarks. :^)

Dan Saks and I have done an extensive analysis of “hello, world” across (a few) microprocessors and compilers. You would not believe the differences in the implementations. (That wasn't our mission, it just turned out that way).

ADS does a pretty good job on the ARM. But it's because the compiler “figures out” that you are just printf'ing a constant string expression, and so it replaces your printf with a call to puts. Set a breakpoint on printf, and wonder why your program never gets there…

Interestingly, gcc does the worst with “hello, world”, both because it takes you literally and because the linker doesn't do a great job of weeding out bits of the library that you don't really need. But give it a more realistic application with a few thousand lines of code, and it runs right in the pack with everyone else; and the differences among everyone start to get pretty small.

Moral of the story? Sometimes you can't pick the “right” processor until you can benchmark it against your real application code. But how can you take advantage of the features of the processor when you haven't selected one yet? Damned if you do, damned if you don't.

Niall Murphy
While Jim's points above are valid, they are criteria that never seem to matter much when I am making a processor selection. I figure that code density and Megaflops-per-dollar (or whatever your speed measurement is) matter if the volumes are big and the margins are tight and the processor is a significant part of the price.

I was looking at a design recently with a quarter-VGA screen. The screen was going to cost more than $100 and the CPU was around the $10 mark. So paying a couple of extra dollars for the CPU was going to be a very small fraction of the total price. Because it was low volume, those few dollars could have been easily recouped if the tools were better. As silicon gets cheaper the CPU will become a smaller and smaller fraction of the total cost.

Maybe as the volume goes up, the subtle details (and intrinsic “goodness”) of the processor matter more, and as volume goes down, the tools (and therefore development time) matter more (as Jack pointed out).

My guess is that while a big percentage of the chips shipped are in high-volume products like PDAs, that is not necessarily where the majority of embedded developers are working. There are a lot of low volume products, and those are also the places where you cannot afford to re-engineer if you realize that your processor is not the optimal one.

Jim Turley
Bill points out:

But give it [gcc] a more realistic application with a few thousand lines of code, and it runs right in the pack with everyone else; and the differences among everyone start to get pretty small.

Ah, but does that indicate that the processors are similar, or that the *compilers* are similar? It could be that the processors are wildly different but their compilers are hiding those differences. C code can only generate certain types of constructs, which limits its usefulness for certain tasks. It's like describing fluffy kittens in Klingon; there's just no vocabulary for it.

I suspect that C code — any C code — hides the underlying differences between CPUs and makes them all look the same. Maybe that's okay. Maybe that's what programmers want. I think it's hiding a light under a bushel and robbing programmers of the huge benefits (or misery) they could be getting.

Moral of the story? Sometimes you can't pick the “right” processor until you can benchmark it against your real application code. But how can you take advantage of the features of the processor when you haven't selected one yet? Damned if you do, damned if you don't.

Benchmarking is never easy, always surprising, and rarely comforting. It never seems to generate the results you were expecting, and it leaves you wondering what it is you did learn from the process. Sometimes it's easier to crawl into a hole and pretend that MHz (or Dhrystone) is good enough. The EEMBC guys do a fine job of creating and managing realistic embedded benchmarks, but it's a tougher job than even they expected, I think.

And you're right: you can't evaluate a chip until you pick a chip to evaluate. And another, and another… That's part of the reason so many engineers say, “Screw it. I'll pick Brand X because that's what I used last time.” Deep down they might know that the “right” chip is out there somewhere. But it's such a hassle to find it, many don't bother.

What's the right answer? Do benchmarks provide a good guideline? Does there need to be a database of CPU features that programmers can look up, a kind of CPU Yellow Pages? Do we go passive, and allow marketing to determine our choices? Or is there a better way?

Bill Gatliff
You can't pick the “right” CPU without understanding something about what you're picking from. Databases and yellow pages will help, but for specific questions like, “is the MAC instruction in SH really going to help my application enough to make it worth switching to that architecture?”, there's no substitute for the datasheet.

But Jack's right: one of the hardest things for engineers to learn is that business issues are even more important than instruction sets. At the end of the day you may find the perfect CPU, but a lesser machine may still win because you can get it for half the cost, twice the volumes, learn to use it in half the time, get better and cheaper tools, and so on.

Jim Turley

Niall observed:

Because it was low volume, those few dollars could have been easily recouped if the tools were good. As silicon gets cheaper the CPU will become a smaller and smaller fraction of the total cost.

Good point. Most embedded systems already spend more money on RAM than on the CPU. Sometimes that makes code density more (financially) important than CPU performance. In the ASIC or SoC realm, the processor is usually about 5% of the total area of the chip.

Maybe as the volume goes up the subtle details (and intrinsic “goodness”) of the processor matter more, and as volume goes down, the tools (and therefore development time) matter more (as Jack pointed out).

Could be. I think that as volume goes up, the cost of the processor matters more. The importance of the processor's “goodness” is proportional to its role in the system, I suspect. Some systems are CPU-intensive. Others, like your LCD controller, aren't. In those cases, price, power consumption, tools, and familiarity grow in importance.

My guess is that while a big percentage of the chips shipped are in high-volume products like PDAs…

Palm and Handspring would love to hear you call PDAs “high-volume products.”!!

…that is not necessarily where the majority of embedded developers are working. There are a lot of low volume products, and those are also the places where you can not afford to re-engineer if you realize that your processor is not the optimal one.

Yup. Only 9% of all microprocessors sold are 32-bit chips. (Only about 2% of those go into PCs.) So clearly, most embedded developers are working on things like thermostats and microwaves where performance is pretty irrelevant, and cost and availability are everything. Longevity, too. It's important to Magnavox (for instance) that its controllers be in production for at least 10 years.

Jack Ganssle
Jim said:

The EEMBC guys do a fine job of creating and managing realistic embedded benchmarks, but it's a tougher job than even they expected, I think.

I think it's harder than they know. A recent mailing to their e-mail list suggested that one could simply scale their 400 MHz numbers (for a particular CPU) to understand the same system at, say, 200 MHz. I figure that cannot work because, once out of the cache, memory speeds will make 200 MHz look about the same as 400. It's not clear to me what any benchmark means anymore,

Jim Turley
Jack wrote:

It's not clear to me what any benchmark means anymore.

Benchmarks are like image compression. You lose some detail but you hope the overall picture is still recognizable.

Given that C and other high-level languages make all CPUs look the same (that's not a foregone conclusion, but humor me for a moment), and given that most engineers don't have the time or inclination to evaluate CPUs on their merits, is it therefore pointless for CPU companies to develop new instruction sets?

If instruction sets don't matter, except when they're compatible with your existing code, then do new instruction sets have a chance? Or have all the successful instruction sets already been done, and all that's left is to milk them for successive generations?

Bill Gatliff

…is it therefore pointless for CPU companies to develop new instruction sets?

Not if a given CPU/compiler combination yields a system that is Fast Enough, Cheap Enough, or Electron-challenged Enough (for lack of a better term) for a given application.

But you never reach that point of stability for very long. Sure, today you've got plenty of horsepower and you're tied to a service main so power consumption is no problem. But tomorrow, your users will want to take your product out on the road (power dissipation, i.e. MIPS/Watt), or they'll want a smaller package (power dissipation=heat), or they'll want to connect you straight to a 1,000BaseT IPv6 network (interrupt latency, throughput, god knows what else). Or they'll want you to tie the CPU closer to the hardware, so you can change in software what today requires a PLD change (interrupt latency and throughput, context switching).

If instruction sets don't matter, except when they're compatible with your existing code, then do new instruction sets have a chance? Or have all the successful instruction sets already been done, and all that's left is to milk them for successive generations?

Do I have to humor you anymore? :^)

Pick an example, say, Linux. That's not 100% C code (although many of the user applications are). At some level, someone has to deal with the assembly code to do context switching, and a good instruction set will make that process fast and painless. A bad one will make it slow and painful. Fast context switches yield faster multitasking, and more efficient processor utilization for multitasking applications. Slow context switches, pretty much the opposite.

So an embedded Linux application written in 100% C still cares about the underlying instruction set. And so yes, improvements in instruction sets do matter.

Jack Ganssle
Jim Turley wrote:

Yup. Only 9% of all microprocessors sold are 32-bit chips. (Only about 2% go into PCs.)

Is this true? Is there some backup and trend? It's an astonishing number.

This means 32 bits does not matter. Instruction sets don't matter (as 90% of CPUs are 8-/16-bitters, mostly with brain-dead architectures), performance woes are never cured via high horsepower. GNU & Linux — at least in the embedded space — don't matter (forgive me, Bill).

Do you/someone have a breakdown of CPU sales?

Is it therefore pointless for CPU companies to develop new instruction sets

Yes, that's what this means. Also recognize that most compilers use a tiny subset of the instruction set. And that the general trend is decreasing memory prices so code bloat/code density problems tend to self-correct.

Occasionally a killer instruction comes along — like MAC — but that's rare.

Why change CPUs? The usual answers are performance, a different peripheral mix, lower power, or bigger address space. Of these four, only the first is related to an efficient instruction set. Yet the developers I talk to very often tackle performance issues by building smart logic that does the fast stuff outside of the CPU. I was at a place this week that pushes data around very, very fast… with an 8031 and a fast ASIC.

Jim Turley
About 60% of all processors sold are 8-bitters; 4-bitters and 16-bitters each make up about 15%, and 32-bit CPUs are the smallest category, at about 9%. (The portions are a bit smaller if you count DSPs.) Of the 32-bit processors, about 75% of them go into embedded systems. And these are new processors. The installed base is even more skewed toward low-end chips.

Embedded processors outsell PC processors by orders of magnitude, and always have. Remember, microprocessors were invented for embedded systems, not for computers.

This information is in my January column in Embedded Systems Programming . You can verify the data from the usual sources, such as WSTS or SIA.

Jim Turley
Jack opined:

Also recognize that most compilers use a tiny subset of the instruction set.

That used to be true, but RISC “fixed” that. The point of RISC was to eliminate CPU instructions the compiler didn't use. Thus, we have a number of similar RISC architectures today. ARM, MIPS, SPARC, and so on are all pretty similar because they're all supposed to be minimalist implementations of a C compiler's output. Compilers should be using *all* of a RISC processor's instruction set, by definition.

And that the general trend is decreasing memory prices so code bloat/code density problems tend to self-correct.

I would tend to agree, but there seems to be real customer demand for code-compression anyway. Maybe nobody really uses it, or they just like having it around. It can save, oh, maybe 15% on memory size so it's not a big deal. Certainly no one from Redmond seems to care…

Jack wrote:

Yet the developers I talk to very often tackle performance issues by building smart logic that does the fast stuff outside of the cpu. I was at a place this week that pushes data around very, very fast… with an 8031 and a fast ASIC.

Yes, the CPU-plus-accelerator approach is very popular. Use hardware where performance is important; use processors where flexibility is important. That works well when the performance-sensitive features are fixed.

Aha! Now we're back to the original question. If you can put the high-performance stuff in a hardware accelerator, why not put it in the processor instead and get all the advantages of programmability? Now you've got a CPU with one (or more) unusual but very useful instructions. Naturally, those instructions won't map onto a C compiler. But with a little assembly-language work, you've got the best of both worlds. So instruction sets do matter! 😉

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.