Is assembly language best?

Colin Walls, Mentor Embedded

December 14, 2012

Colin Walls, Mentor EmbeddedDecember 14, 2012

Editor's note: In this article Colin Walls questions the belief that assembly language coding is a time consuming process requiring significant skill and patience. Using some specific examples, he provides some tips on how to make the process of writing assembly code less laborious and error prone.

It is generally believed that hand-written assembly language code, though time consuming and requiring significant skill, will always yield the best result. This article questions that belief and illustrates common coding situations where using a high level language, like C, with a modern compiler can consistently do better. It is partly a matter of understanding compiler optimization and code generation and partly a matter of appreciating human nature.

Getting Optimal Code
Everyone knows that the best way to get fast, tight code for an embedded application is to employ the skills of an expert assembly language programmer. Of course, the programmer must be familiar with the specific device’s architecture, instruction set, and assembly language, and they must also be comprehensively briefed in the application. Obviously, a compiler cannot compete, as there is no way that it can be aware of the exact requirements of the application.

This may seem like common sense, but one factor is not considered: human nature. I'll come back to this later.

The Compiler Approach
Modern embedded compilers are extremely good. Typically, a compiler will have an extensive array of optimizations that it can perform and offers the developer fine grain control over code generation. Compilers also look for patterns in the code, which can lead to an optimal result.
To investigate how compilers tackle typical embedded code, I will consider C language switch statements. A switch statement can broadly take one of four forms:

  • A small number of case values
  • A larger number of case values that are a contiguous sequence
  • A larger number of case values that are almost a contiguous sequence
  • A larger number of case values that are not a contiguous sequence

I will look at each of these in turn. The examples are built using the Sourcery CodeBench product from Mentor Embedded for a Coldfire target, with minimal optimization to render readable code. I chose this target because the assembly language is quite readable. I would expect other vendors’ compilers to produce similar results.

Few case values
It is common to use a switch when the logic gets a little too convoluted for if/else constructs. In this situation, there are normally only a small number of case values, like this:

switch (x)
{
case 1:
  a();
case 22:
  b();
case 83:
  c();
}

For most CPU architectures, the efficient way to deal with this logic is with a few discrete compares and jumps, resulting in code like this, where %d0 is used for x:

  moveq #22,%d1
  cmp.l %d0,%d1
  jeq .L4
  moveq #83,%d1
  cmp.l %d0,%d1
  jeq .L5
  moveq #1,%d1
  cmp.l %d0,%d1
  jne .L6
.L3:
  jsr a
.L4:
  jsr b
.L5:
  jsr c
.L6:


A Contiguous Sequence
In other situations, much larger numbers of case values may be used. Most compilers do not set a limit on how many you can have, but there are strong arguments against having too many, as code readability suffers. Sometimes, the case values are contiguous, thus:

switch (x)
{
case 1:
  a();
case 2:
  b();
case 3:
  c();
case 4:
  d();
case 5:
  e();
}


A compiler will spot this pattern and generate code something like this:

  add.l %d0,%d0
  move.w .L8(%pc,%d0.l),%d0
  ext.l %d0
  jmp %pc@(2,%d0:l)
  .balignw 2,0x284c
  .swbeg &6
  .L8:
  .word .L9-.L8
  .word .L3-.L8
  .word .L4-.L8
  .word .L5-.L8
  .word .L6-.L8
  .word .L7-.L8
.L3:
  jsr a
.L4:
  jsr b
.L5:
  jsr c
.L6:
  jsr d
.L7:
  jsr e
.L9:


The code uses the value of x to index into a jump table. This is much faster than performing a long sequence of compares. Clearly the number of case values at which this approach becomes most viable will vary from one instruction set to another.

Almost a Contiguous Sequence
Another possibility is when the case value sequence is very close to being contiguous, like this:

switch (x)
{
case 1:
  a();
case 2:
  b();
case 3:
  c();
case 4:
  d();
case 6:
  e();
}

Once again, the compiler recognizes the sequence and creates a jump table:

  add.l %d0,%d0
  move.w .L8(%pc,%d0.l),%d0
  ext.l %d0
  jmp %pc@(2,%d0:l)
  .balignw 2,0x284c
  .swbeg &7
.L8:
  .word .L9-.L8
  .word .L3-.L8
  .word .L4-.L8
  .word .L5-.L8
  .word .L6-.L8
  .word .L9-.L8
  .word .L7-.L8
.L3:
  jsr a
.L4:
  jsr b
.L5:
  jsr c
.L6:
  jsr d
.L7:
  jsr e
.L9:


This time the table includes a dummy entry to accommodate the “missing” case value.

More Random case Values
The last possibility is when there are a fair number of case values with no real pattern, like this:

switch (x)
{
case 11:
  a();
case 23:
  b();
case 113:
  c();
case 40:
  d();
case 5:
  e();
}


In my tests, the compiler reverts to a sequence of compares, thus:

  moveq #23,%d1
  cmp.l %d0,%d1
  jeq .L5
  moveq #23,%d1
  cmp.l %d0,%d1
  jlt .L8
  moveq #5,%d1
  cmp.l %d0,%d1
  jeq .L3
  moveq #11,%d1
  cmp.l %d0,%d1
  jeq .L4
  jra .L9
.L8:
  moveq #40,%d1
  cmp.l %d0,%d1
  jeq .L6
  moveq #113,%d1
  cmp.l %d0,%d1
  jeq .L7
  jra .L9
.L4:
  jsr a
.L5:
  jsr b
.L7:
  jsr c
.L6:
  jsr d
.L3:
  jsr e
.L9:


It is interesting to note that the compiler realizes that with a larger number of values it can reduce the number of compares by splitting them and doing the “low” and “high” values separately.

Although it did not occur in my tests (which were not rigorous, but I also did them for ARM and X86), I am sure that, for some architectures, another code generation strategy might be better for this case: a look up table of values and addresses.

The Human Approach
So, how would the human programmer tackle coding of these four kinds of structure? As I already said that we have an expert on the team, it would be natural to expect that their approach would be at least equal to the compiler – probably better.

However, this is unlikely. A well trained programmer would know that it is good practice to write maintainable code, even if this is at the expense of some performance/efficiency. So, taking advantage of a contiguous (or nearly contiguous) sequence of case values to index a jump table would be bad practice, as a future change to the software might not be accommodated by that structure. No human developer would want to contemplate the necessity to rewrite a chunk of code if the specification changes later in the project, so they will code defensively.

A compiler does not mind rewriting the code every time, so it will most likely always produce an optimal result. The use of a switch statement in C code is very maintainable, however, so it makes sense to have human developers work in a high level language whenever possible.

Postscript: When writing this article, I was reminded of some of my early embedded software experience, which was programming in Forth. At that time, it was claimed that such threaded interpretive languages could yield a code size, for a given functionality, smaller than assembly language. I was skeptical, but we did some tests and could show that it was, indeed, true for a reasonably large application. Runtime performance, or, rather, lack thereof, was the major cost for such compact code. Although with modern devices, we might find the execution speed acceptable, the “read only” nature of Forth code and, hence, the challenges in its maintenance, make me pessimistic about its revival in the near future.

Colin Walls has over thirty years experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, Colin is an embedded software technologist for the Mentor Graphics Embedded Software Division, and is based in the UK. His regular blog is located at blogs.mentor.com/colinwalls. He may be reached by email at colin_walls@mentor.com



Loading comments...

Most Commented

Parts Search Datasheets.com

KNOWLEDGE CENTER