Is assembly language best?

Editor's note: In this article Colin Walls questions the belief that assembly language coding is a time consuming process requiring significant skill and patience. Using some specific examples, he provides some tips on how to make the process of writing assembly code less laborious and error prone.

It is generally believed that hand-written assembly language code, though time consuming and requiring significant skill, will always yield the best result. This article questions that belief and illustrates common coding situations where using a high level language, like C, with a modern compiler can consistently do better. It is partly a matter of understanding compiler optimization and code generation and partly a matter of appreciating human nature.

Getting Optimal Code
Everyone knows that the best way to get fast, tight code for an embedded application is to employ the skills of an expert assembly language programmer. Of course, the programmer must be familiar with the specific device’s architecture, instruction set, and assembly language, and they must also be comprehensively briefed in the application. Obviously, a compiler cannot compete, as there is no way that it can be aware of the exact requirements of the application.

This may seem like common sense, but one factor is not considered: human nature. I'll come back to this later.

The Compiler Approach
Modern embedded compilers are extremely good. Typically, a compiler will have an extensive array of optimizations that it can perform and offers the developer fine grain control over code generation. Compilers also look for patterns in the code, which can lead to an optimal result.
To investigate how compilers tackle typical embedded code, I will consider C language switch statements. A switch statement can broadly take one of four forms:

  • A small number of case values
  • A larger number of case values that are a contiguous sequence
  • A larger number of case values that are almost a contiguous sequence
  • A larger number of case values that are not a contiguous sequence

I will look at each of these in turn. The examples are built using the Sourcery CodeBench product from Mentor Embedded for a Coldfire target, with minimal optimization to render readable code. I chose this target because the assembly language is quite readable. I would expect other vendors’ compilers to produce similar results.

Few case values
It is common to use a switch when the logic gets a little too convoluted for if/else constructs. In this situation, there are normally only a small number of case values, like this:

switch (x)
{
case 1:
  a();
case 22:
  b();
case 83:
  c();
}

For most CPU architectures, the efficient way to deal with this logic is with a few discrete compares and jumps, resulting in code like this, where %d0 is used for x:

  moveq #22,%d1
  cmp.l %d0,%d1
  jeq .L4
  moveq #83,%d1
  cmp.l %d0,%d1
  jeq .L5
  moveq #1,%d1
  cmp.l %d0,%d1
  jne .L6
.L3:
  jsr a
.L4:
  jsr b
.L5:
  jsr c
.L6:

A Contiguous Sequence
In other situations, much larger numbers of case values may be used. Most compilers do not set a limit on how many you can have, but there are strong arguments against having too many, as code readability suffers. Sometimes, the case values are contiguous, thus:

switch (x)
{
case 1:
  a();
case 2:
  b();
case 3:
  c();
case 4:
  d();
case 5:
  e();
}

A compiler will spot this pattern and generate code something like this:

  add.l %d0,%d0
  move.w .L8(%pc,%d0.l),%d0
  ext.l %d0
  jmp %pc@(2,%d0:l)
  .balignw 2,0x284c
  .swbeg &6
  .L8:
  .word .L9-.L8
  .word .L3-.L8
  .word .L4-.L8
  .word .L5-.L8
  .word .L6-.L8
  .word .L7-.L8
.L3:
  jsr a
.L4:
  jsr b
.L5:
  jsr c
.L6:
  jsr d
.L7:
  jsr e
.L9:

The code uses the value of x to index into a jump table. This is much faster than performing a long sequence of compares. Clearly the number of case values at which this approach becomes most viable will vary from one instruction set to another.

Almost a Contiguous Sequence
Another possibility is when the case value sequence is very close to being contiguous, like this:

switch (x)
{
case 1:
  a();
case 2:
  b();
case 3:
  c();
case 4:
  d();
case 6:
  e();
}

Once again, the compiler recognizes the sequence and creates a jump table:

  add.l %d0,%d0
  move.w .L8(%pc,%d0.l),%d0
  ext.l %d0
  jmp %pc@(2,%d0:l)
  .balignw 2,0x284c
  .swbeg &7
.L8:
  .word .L9-.L8
  .word .L3-.L8
  .word .L4-.L8
  .word .L5-.L8
  .word .L6-.L8
  .word .L9-.L8
  .word .L7-.L8
.L3:
  jsr a
.L4:
  jsr b
.L5:
  jsr c
.L6:
  jsr d
.L7:
  jsr e
.L9:

This time the table includes a dummy entry to accommodate the “missing” case value.

More Random case Values
The last possibility is when there are a fair number of case values with no real pattern, like this:

switch (x)
{
case 11:
  a();
case 23:
  b();
case 113:
  c();
case 40:
  d();
case 5:
  e();
}

In my tests, the compiler reverts to a sequence of compares, thus:

  moveq #23,%d1
  cmp.l %d0,%d1
  jeq .L5
  moveq #23,%d1
  cmp.l %d0,%d1
  jlt .L8
  moveq #5,%d1
  cmp.l %d0,%d1
  jeq .L3
  moveq #11,%d1
  cmp.l %d0,%d1
  jeq .L4
  jra .L9
.L8:
  moveq #40,%d1
  cmp.l %d0,%d1
  jeq .L6
  moveq #113,%d1
  cmp.l %d0,%d1
  jeq .L7
  jra .L9
.L4:
  jsr a
.L5:
  jsr b
.L7:
  jsr c
.L6:
  jsr d
.L3:
  jsr e
.L9:

It is interesting to note that the compiler realizes that with a larger number of values it can reduce the number of compares by splitting them and doing the “low” and “high” values separately.

Although it did not occur in my tests (which were not rigorous, but I also did them for ARM and X86), I am sure that, for some architectures, another code generation strategy might be better for this case: a look up table of values and addresses.

The Human Approach
So, how would the human programmer tackle coding of these four kinds of structure? As I already said that we have an expert on the team, it would be natural to expect that their approach would be at least equal to the compiler – probably better.

However, this is unlikely. A well trained programmer would know that it is good practice to write maintainable code, even if this is at the expense of some performance/efficiency. So, taking advantage of a contiguous (or nearly contiguous) sequence of case values to index a jump table would be bad practice, as a future change to the software might not be accommodated by that structure. No human developer would want to contemplate the necessity to rewrite a chunk of code if the specification changes later in the project, so they will code defensively.

A compiler does not mind rewriting the code every time, so it will most likely always produce an optimal result. The use of a switch statement in C code is very maintainable, however, so it makes sense to have human developers work in a high level language whenever possible.

Postscript: When writing this article, I was reminded of some of my early embedded software experience, which was programming in Forth. At that time, it was claimed that such threaded interpretive languages could yield a code size, for a given functionality, smaller than assembly language. I was skeptical, but we did some tests and could show that it was, indeed, true for a reasonably large application. Runtime performance, or, rather, lack thereof, was the major cost for such compact code. Although with modern devices, we might find the execution speed acceptable, the “read only” nature of Forth code and, hence, the challenges in its maintenance, make me pessimistic about its revival in the near future.

Colin Walls has over thirty years experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, Colin is an embedded software technologist for the Mentor Graphics Embedded Software Division, and is based in the UK. His regular blog is located at blogs.mentor.com/colinwalls . He may be reached by email at

11 thoughts on “Is assembly language best?

  1. I am not sure about the premise of the article – who are these people that think assembler is always better!?

    Even if the use of assembler did result in tighter code (an argument that I suggest is seldom true in practice), it has so may other arguments ag

    Log in to Reply
  2. “One swallow doth not a summer make…” (Aristotle, I believe).

    Likewise, one example of something which compilers do better than humans doesn't make the case, in my view, for writing off assembler. It is certainly true that compilers, these days, are cap

    Log in to Reply
  3. Chris. I agree with you entirely. I think that embedded programmers should, at a bare minimum be able to understand assembly, if only to make sure that the compiler is doing a good job. Resorting to writing assembly is also an option that I would not wa

    Log in to Reply
  4. Not exactly apples to apples in my opinion. You present the compiler creating the most efficient code it can, then present a 'defensive' assembler programmer. Not the same thing.

    What about the case of the programmer attempting to create the most efficien

    Log in to Reply
  5. I am struggling to see the opportunity for “a reduced length development cycle” by using assembler? A number of modern compilers support vectorisation optimisations that take advantage of SIMD instructions. I'll concede the last time I *needed* to write

    Log in to Reply
  6. Yes, I really don't know what point the author is trying to make. The main argument against assembler is that it's non-portable. Processing power and memory is so ridiculously cheap today that there is no longer any argument for using assembly for perfor

    Log in to Reply
  7. Assembly language is the best for certain operations. For example there are many instructions in most CPU architectures that do not have any C mappings (eg. the rotate instructions in x86). Assembly lets you exploit instructions that C can't.

    Make sure t

    Log in to Reply
  8. I'll throw my (first ever!) comment in here and join the fray. I grew up on 6502, z80 and forth so I'm perhaps somewhat biased. That said, I've worked in the games industry now for almost 25 years and while C++ is the language of choice, assembly language

    Log in to Reply
  9. I too 'grew' up writing for the 6502, and the main reason we used assembler is that C was not a language in wide use yet, and when your cpu is only running at 1MHz,( yes, there really was a time when things ran that slow) every instruction counts. And even

    Log in to Reply
  10. In the early days of modems – before DSPs were reasonably priced,
    we built modems using general purpose processors like 8051, 8088 and 8086
    to build V22 (1200 bps) and V22bis (2400 bps) modems at Concord Data Systems (now defunct). With hard real time li

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.