Some assembly required - Embedded.com

Some assembly required

Click here for reader response to this article

In a recent ONLamp article Randall Hyde argues that Great Programmers are assembly language experts. His paper addresses Windows/Linux developers, of course, since no one has heard of our industry. Hyde worries that applications like word processors haven't gotten much faster since the early days of WordStar running on a 4.77 MHz 8088, and attributes at least some of the sluggishness to inefficient, bloated code.

No doubt there's some truth to that, but surely the computational cost of an elaborate GUI supporting vast filesystems bears much of the blame. I wonder, too, if some of the vaunted gain in raw processor speed is lost by cache-busting programs like Word. Sure, a 3-GHz machine screams on small apps running entirely from the fastest RAM. But wander out of cache and the system has to issue hundreds of wait states for each main memory load or store. With no cache, even with 50-nanosecond DRAMs a fast Pentium crawls along at 20 MHz.

Hyde claims knowledge of assembly gives developers the ability to craft faster high-level code. Of that I'm not so sure. If we patterned C++ code after assembly language we'd always use pointers (a very assembly-like construct) instead of arrays even when an array is the best solution, and eschew automatic variables in favor of globals. And surely the code would be polluted with an excess of GOTOs; conditional GOTOs most likely, since if(variable) goto x; looks just like jnz x .

It's awfully hard to correlate high-level structures with the handful of assembly instructions most compilers use. Creating an object invokes a constructor — which generates what instructions? And what can I do to optimize it? Search a string and the compiler will almost certainly invoke a runtime routine, which is hidden in object form in some inscrutable library. We rely on the compiler to abstract us from these low level details, and expect it to generate an efficient translation.

And yet there is value in knowing what your tools do. In the embedded space we're faced with real-time constraints that are unheard-of in the PC/workstation world. Certain sections of our code are always performance bound.

Consider interrupt service routines. Except for the most demanding applications I'd never advocate writing ISRs in assembly, yet using C has its own set of problems. Does that a= b*c; statement execute in a microsecond — or a week? Do we dare use floating point when the routine must complete in less than 100 microseconds?

A friend once told me of working on a Navy job using the CSM-2 language. The compiler was so awful they learned to write a source function, compile it, and then examine the resulting assembly before doing any testing. If the code looked wrong they'd change something — maybe even spacing — in the source and recompile, hoping the change would trick the tool into generating correct code. I laughed, thinking that's like programming into a black hole of uncertainty. Yet unless we know what our tools do, what sorts of code they're likely to generate, writing real-time code is also coding into a black hole. If the function isn't fast enough, we change something, nearly at random, hoping to get better performance.

And so I go a step further than Mr. Hyde. Don't structure your high level source code like assembly language, and never think in assembly when cranking C/C++ code. But for time-critical sections do examine the generated code. Look for simple optimizations, be wary of calls to runtime routines. Always instrument ISRs and other performance-bound functions to measure their performance.

Great firmware programmers do know assembly. They embrace it. For these low level routines are where the C meets the assembly and the hardware. Lights are flashing, motors spinning and analog zings in and out. For me, working on the boundary of the system where the firmware meets the real world is the best part of any project.

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at . His website is .

Reader Response


Jack you make an excellent point. In the often cost constrained commercial world where an 8bitter must do the work of a 16 bitter and so on the use of assembly becomes more critical. I've often seenwhere there were coding standards that had portions based on the relative assembly language efficiency ofdifferent constructs with inefficient constructs limited to emergency use only. I've also seen whole “C”compiler libraries replaced with more efficient assembly language based replacements optimized for size andspeed. I've even seen special fixed point math libraries and routines done in assembler. Even now today,it is essential for an embedded project to have a few assembly language guru's if speed or code size matter.

– Wally Murray


I agree with Jack on the point that being able to traverse the entire depth of a softwaredevelopment project, from designing (some) GUIs to writing your own device drivers, and then debugging itusing a scope or a DVM is the best part of all. For me, this multitude of challenges is what keeps me going,but in the back of my mind I am always weary of the saying “Jack of all trades… master of none.”

Now, on the more important point (i.e. know your tools down to the assembly) I am rarely given the luxury tospend enough time in one environment (CPU/tool set combination) to be able to learn enough of the details tomanually surpass the optimization capabilities of the compiler. However, looking at the resulting assemblyis always a viable option, at least one can estimate the amount of machine cycles a few lines of C/C++ cangenerate. But nothing should substitute for instrumenting/measuring the actual runtime performance, if atall possible.

As for the completely undeterministic compiler, my digital firmware colleagues could fill volumes withstories of changing one line in the PLD source code of an unnamed product line, only to find that threeunrelated functions stopped working in another part of the same device. It's a nightmare to change anythingin this supposedly cutting edge tool environment, so most of the workarounds are relegated back to thesoftware…

Knowing your tools in this case means knowing to avoid them in the next project! 😉

– Csaba


I used to do everything in assembly, mainly because the tools were cheap (free!). I finally convinced thepowers-that-be to buy a C compiler for my Motorola (Freescale? who came up with THAT?!) HC05 parts, and Ihave nevr looked back! Actually I did do a couple of minor functions in assembler, but now, using Atmel AVRdevices & the IAR embedded workbench, assembler is a thing of the past. To be fair, none of my projects arebleeding edge technology, so those who have to eke every last “bit” of performance out of a particular chipmay need to custom write assembly language functions, but processors are coming out so quickly, with suchcapability, that this process is seems more an exercise in machismo than a functional necessity.

– Dave Telling


Assembly language marks the boundary between “hardware” and “software” designs. Hardware hasperformance requirements, both absolute and relative to its competition. Hardware designers strive for highperformance. Software does not have performance requirements; it has functionality requirements. Software designersstrive for high functionality.

Assembly language is symbolic machine language. Each assembly language instruction takes N clocks at K MHzto execute. If you are designing for performance, you have to know the instruction stream to count theclocks. You can either design hardware performance critical code in assembly language or do an assemblylanguage dump of what the compiler spits out to understand what is going on.

If the code for a function has a lot of performance margin, you can program in in C or whatever, measure theexecution time and check it for corner cases. If the code is central to the system hardware performance, you will probably be driven to use assembly. Forexample, I think that ~50% of DSP code is still in assembly language.

Computers are blindingly fast relative to humans, but not relative to machines. Even a 20 MIPs CPU willexecute 16 million instructions in a heartbeat, the clock rate of human interfaces. (cursors once blinked at800 ms.) For an automobile engine at 6000 RPM, you have 10 milliseconds to figure out what the 8 cylindersare going to do next, 80 times faster. If your hardware is timing sensitive, you may have microsecondsinstead of milliseconds to sort it out. And unlike human interfaces, you can't just take a little longer ifyou need it. Time is the critical resource. Functionality can be increased only as long as performance canbe met.

For complimentary reasons, software is written in high level languages. In software, you are striving forhigh function, which usually means high complexity. High level languages are designed to support highcomplexity designs by using sophisticated structures of abstraction and error checking. The result is tomaximize the functionality achieved per hour of programming time while keeping the program below thecomplexity limit of the programmer. The assumption in software is that function is what you are selling. Ifyou need more performance to make the function acceptable, buy a faster CPU, more memory and more disk.Assembly language is where the two worlds collide. The two cases are where the hardware performance isadequate but more function is needed, or where the software function is adequate but more performance isneeded. The typical solution is to divide the problem into small, high performance, low functionality modules (“critical loops”) written in assembly and high functionality, low performance modules (such as theuser interface) written in C, Java, etc. Embedded design usually has a foot in both camps – or a foot on thedock and a foot on the boat, as I usually envision it. You may need to supply high performance and highfunctionality with an 8051 budget. To do this, you balance the performance and functionality requirements bybalancing the mix of high performance assembly code and high functionality C code. To make it work, you hadbetter know both pretty well.

– David Wyland


Both this and the associated articles were very interesting. Basically I'm in agreement with theidea that knowing assembly language programming can be an asset in the high-level language arena. However,thinking about these articles immediately brought up the following conundrum.

As an engineer with over 20 years either programming in or teaching assembly language, I clearly agree withthis side of the issue. However, I conjecture that programmers who don't know assembly might immediatelydeny the need for this knowledge. Basically we're saying that they are somehow deficient and few of usenjoy being told that – much less being told that to correct said deficiency will require a lot ofself-directed hard work.

– Mark Watson


It's been over three years since I last caught a C compiler generating wrong code… I've seen ithappen maybe three times in my career. (All different compilers.) But I agree, being able to look at andmake sense of the generated code can be essential.

One example: on one embedded system we were using an increment or decrement of a byte (unsigned char) as asort of minimalist semaphore. Every once in a rare while the mechanism didn't work. Lots of headscratching followed until I looked at the generated assembly code and discovered that a byte increment ordecrement was not an atomic instruction! I was accustomed to increment and decrement always beinguninterruptable, but when I checked, sure enough, the ColdFire (our processor for this project) doesn't havesuch instructions for byte variables. Changing the variable to an int (32 bits) allowed the compiler to usethe ADDQ instruction, which IS atomic, and the problem was solved.

(Hint to programmers NOT very experienced in assembly language: looking at generated code in comparison tosource code will give you new insights into the assembly language and tricks to get the most from it!)

– Mark Bereit


I think that it's more important to realize what the *compiler* does when you use thingssuch as passing parameters by value, or floating-point emulation, for example. This may be even moresignificant for higher level languages like C++. But IMHO, you don't need to be a assembler guru to realizesuch things…

– ph


If writing code in C is walking, and writing code in C++ is running (arguable, at least), codingin assembly is crawling. Yet even in the real world, there are times one must crawl, lest they bang theirhead on the overhang. In highly time critical situations–where every instruction cycle must be counted–ormemory critical situations, assembly is king. Even if you don't write the code in assembly, knowing theopcodes and how the processor works can save time and memory.

For example, consider the necessity to repetively do something N times. One might write:

int i;
for (i = 0; i < n;="" i++)="">
{
/* Do the task. */
}

But on the 8051 processor, it is more economical in both memory and instruction cycles to do it as:

for (i = N; i > 0; i–)
{
/* Do the task. */
}

Given that the task does not need to know what i is during processing, this allows the use of the DJNZ(Decrement and Jump if Not Zero), which saves memory and instructions. While other optimizations for theprocessor can be done by an optimizing compiler, I wouldn't assume (nor would I like) the processor torewrite the loop from an incrementing one to a decrementing one.

As another example, let's look at the Microchip PICmicro. In their instruction set, there are two opcodes,BTFSC and BTFSS, which are Bit Test File and Skip on Clear/Set, respectively. One might normally writesomething such as:

if (flag == 0)
value = 2;
else
value = 4;

Depending on the level of optimization of the compiler, this might end up as:

MOVF flag,F ; Test flag
BTFSS STATUS,Z ; Check the Z bit and skip if set
GOTO j4 ; Not set, so jump to j4
MOVLW d'2' ; Move 2 into W
GOTO cont
j4 MOVLW d'4' ; Move 4 into W
MOVWF value ; Move W to value

Of course, it could be worse. But rewriting the C to:

value = 4;
if (flag == 0)
value = 2;

implicitly changes the compiled code to:

MOVLW d'4' ; Move 4 into W
MOVWF flag,F ; Test flag
BTFSC STATUS,Z ; Check Z bit, skip if clear
MOVLW d'2' ; Z set: Move 2 into W
MOVWF value ; Move W to value

I think you'd find that 99% of C programmers would write the code as the former example listing, while onesfamiliar with the PICmicro architecture would write it as the latter example listing. Of course, a perfectoptimizer would make it a moot point, but how many “perfect” optimizers are out there, anyway? And how manydevelopers use the optimizer? Many turn them off to prevent the phantom code problem during debugging.

It's always been my belief that not knowing assembly on the processor you're using (or in the vary least anunderstanding of the opcodes and its architecture) can benefit the programmer in memory and cycleutilization. Assembly isn't required, but it is recommended.

– John Patrick


I agree that a good knowledge of the underlying assembly can help you write better code, but notbecause one is thinking in assembly or using assembly coding patterns in C. Instead, by knowing theinstructions that your micro has to work with (and, by extension, the architecture) it lets you steer yourchoice when there is more than one way to do something “right”. For example, a simple statement like

if( (x == 1) || (y == 1) )

would evaluate x first and then y. But if that line were preceded by

y |= z;

it's possible that y is already present in the accumulator, prompting the micro to load x to check and thenreload y. If the order of the two comparisions doesn't matter to your algorithm, swapping them, like so

if( (y == 1) || (x == 1) )

could save you a few cycles. (A simple example, but I've seen cases where rearranging comparisons could savedozens of instructions.)

Doesn't seem like much, I know, but on a cramped 4k micro it can make a big difference to find a break ortwo like that.

Other examples could be given, but the point is that one need not write C that is structured like assemblyor is filled with globals. Knowing what your C code ultimately becomes can help you to mold it a little totake better advantage of the environment you're working in.

– Jason Reene


Take the example:

unsigned int a,b,c,d;

a=300;
b=500;
c=100;

d = (unsigned long *)( a * b ) / c;

Depending on your compiler and the level of optimization set, you may get 1500, or you may get 189.

This has happened to me on a processor with a built-in 16×16->32 bit multiply and a 32/16->16 bit divide in hardware. The compiler correctly did the multiply to a 32 bit intermediate value, then zeroed the upper 16 bits before doing the divide. It took several test cases with different casting to get the compiler to give the correct result. If I hadn't known the assembly code, I never would have figured out why the compiler didn't properly use the hardware.

– Bruce Casner


I think someone has got it all wrong, if assembly is a neccessity. Keep in mind, that the second you insert the first line of assembly into your project, it's not portable any more. Granted one must access peripherals and their configuration registers in a non-portable way, but at least all the mcu's I've come across, have some sort of zero overhead binding to hardware through inline assebly or macro definitions. Counter argument could be of course, that because C/C++ has no standard way of handling signals/interrupts, your code will be implementation dependant anyway.

I personally prefer to keep those non standard parts of code in separate modules. This results to use of very primitive drivers. One positive aspect is the ability to use different optimisation flags for low level driver modules, where everything should be as you wrote and the higher level code can be left to the mercy of the optimiser. I would say that we can do without the assembly, but not without the drivers…

– Jussi Vnsk


Understanding assembly language IS a benefit, especially in the embedded world (that's where we live people, right?). In the world of real-time, time is the killer, and having an understanding of what your processor is doing is crucial. The only way to really know is to read the assembly output. Granted, in this day of highly optimized compilers, it can be a bit hairy, especially with the RISC architectures (PPC and ARM I've used).

Having cut my teeth on assembly code, I do very little of it these days. I do, however, tend to go to the assembly listing when the problem is not obvious, to see what's really going on.

The new dean of the department where I graduated 20+ years ago (Computer Engineering @ RIT) asked me last time I visited, if they should keep assembly language as part of the program. I told him yes, even if the students never really do any programs in assembly, the will at least have the necessary insight when the time comes.

– Stephen Beckwith


I do agree, that in our business, knowing what's going on behind the scenes of the compiler is important. Our tools aren't as widespread as the Microsoft or Borland compilers, and does contain bugs.

Furthermore, when speed and memory requirements are important, it is very good to have a knowledge of when the compiler will emit 20 rather than 2 instructions.

However, blaming bloated Windows/Unix applications on lack of assembler knowledge is wrong. When processing power and memory is virtually unlimited, a good programmer ought to takes advantage of that to save development time, rather than to squeeze 90% extra out of a non time-critical function that will execute in no time anyway.

A good high-level programmer needs even more to focus on the algorithm, and looks out for spots where unnesseccarry objects are created/destroyed instead. Failing to do so, is probably the main cause of bloated, slow and memory-hungry applications.

Aside from that, I have started to suspect that the design of the Windows GUI is another large cause of problems. Not because of bugs or a bloated implementation, but its design puts special requirements on the applications. Failure to meet those demands, like releasing the CPU during time-consuming events, immediately leeds to a computer that really feels sluggish.

– Fredrik L

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.