Reliable programming in ARM assembly language -

Reliable programming in ARM assembly language

Sometimes it's necessary to use both assembly and high-level programming languages when working in the ARM architecture. This paper from ARM TechCon explains why and how.

This article is from class ATC-150 at ARM Technology Conference. Click here for more information about the conference.

The ARM architecture, like most 32-bit architectures, is well-suited to a using a C or C++ compiler. The majority of control code is written using high-level programming languages like C and C++ instead of assembly language. There are good reasons for this. High-level programming languages are inherently safer and less error prone than programming in assembly. Code written in high-level programming languages can also be written to be portable across different architectures.

Some people use assembly language for writing device drivers, but this is usually unnecessary. Most device driver code can be written by mapping a C structure or a C++ class onto the hardware device. However, it is sometimes necessary to use a little bit of assembly code. This paper will describe how to best do this.

What is assembly
Assembly, or assembly code, is roughly used to refer to the instruction set that runs on the target processor. In reality, processors read a sequence of binary words that encode the instructions. Assembly is a step up from binary in that the instructions can be expressed in a human-readable form.

For example, consider a simple C function:

int add2(int a, int b)
return a + b;

When compiled, this code will likely turn into something like the following assembly code:

.global add2
add r0, r0, r1
bx lr
.type add2, function
.size add2, 8

When assembled, this assembly turns into the following two binary words:


An assembler is the tool that converts assembly code into binary. However, the assembler doesn't output binary directly. Instead, it encapsulates the binary into an ELF object file that is usable by a linker.

Unfortunately, on the ARM architecture there is no standard format for assembly language. The ARM tools use a unique syntax that, although expressive, does not resemble the format used by most assemblers. Most assemblers use a format similar to the UNIX and GNU assemblers. The remainder of this paper will use examples in the UNIX style of assembly for ARM.

Types of assembly
There are three primary ways to write assembly: intrinsic functions, inline assembly, and assembly files.

Intrinsic functions are functions that have a special meaning to the implementation. Many intrinsic functions are used to provide assembly functionality to the user.

Consider an operation that is not efficiently expressible in C. We will use the ARM CLZ instruction as such as example. This instruction returns the number of zeros starting from the most significant bit of the source.

To code this operation in C, it might look something like this:

static int count_leading_zeros(uint32_t src)
for (int i = 31; i >= 0; i--) {
if ((src & ((uint32_t)1 << i)) != 0) {
return 31-i;
return 32;

Or it could be coded like this:

static int count_leading_zeros(uint32_t src)
uint32_t bit = 0x80000000;
int i = 0;
while (bit != 0) {
if ((src & bit) != 0) {
bit >>= 1;
return i;

This could be coded any number of different ways, and it's not clear which way is the most natural. It is probably a matter of personal preference. It is a cumbersome matter for the compiler to recognize just one of these forms. So, it would be nearly impossible to recognize all of the ways that a human might express the operation. As a result, it would be nearly impossible for the compiler to automatically identify opportunities to substitute the CLZ instruction for loops that are functionally equivalent unless a standard way of writing the loop was established.

Rather than documenting a standard way to write this loop, an elegant solution is to document an intrinsic function that the compiler will recognize and convert to the CLZ instruction. For example, a compiler might recognize:

int __CLZ32(uint32_t src);  

as an intrinsic function corresponding to the ARM CLZ instruction. So, whenever the user calls __CLZ32() ,rather than actually making a function call to some routine, the compiler will inline a CLZ instruction in place of the call.

An intrinsic function is known to the compiler, and the compiler can understand the form of the resulting assembly code. Semantically, this behaves as one of the above functions would, but the implementation is streamlined. An intrinsic function provides a natural and efficient interface to assembly instructions.

The second approach is inline assembly. One might write something like:

static int count_leading_zeros(uint32_t src)
int ret;
asm("CLZ ret, src");
return ret;

To properly handle this, the compiler must understand a few things. For example, it must understand that this represents an instruction that has two operands. The first operand is a destination that is written by the instruction, the operand must be allocated to a register, and the operand should correspond to the “ret” variable. The second operation is a source that is read by the instruction, the operand must be allocated to a register, and the operand should correspond to the “src” parameter. This instruction does not read or write any other objects other than the operands. Such built-in knowledge can not be complete. There must be limits to the types of assembly that the compiler can safely recognize, but these limitations are hard to describe. May the user define a label in one instance of inline assembly and branch to it from another? Which temporary registers may the user modify? Must all input operands be in unique registers even if the values of these operands are the same?

Another form of inline assembly is the GNU assembly statement. Using GNUasm, the CLZ instruction might be provided as:

static int count_leading_zeros(uint32_t src)
int ret;
asm("clz %0, %1" : "=r" (ret) : "r" (src));
return ret; }

These statements have their own problems. These assembly statements arebasically exposing the inner working of the GNU compiler. While avaliant attempt is made to document the semantics of them, thedocumentation is not always clear. While GNU assembly statements may bean acceptable mechanism for somebody familiar with the inner workings ofthe GNU compiler to provide assembly capabilities, they are verydangerous in the hands of a novice.

Other attempts at providing inline assembly functionality have theirown problems. AT&T-style asm procedures are inefficient, difficultto write due to exponential explosion of cases as the number ofparameters grows, and they interfere with optimizations since allparameters can be both read and written. Simple non-AT&T asmstatements that do not do variable substitution are impossible to safelyinterface with local variables.

Some vendors have deprecated inline assembly in favor of intrinsicfunctions. I believe this is the right approach when it comes toproviding assembly capabilities in high level programming languages.

The final approach is to write assembly as functions written inassembly files. Within the C/C++ code, this looks like a declaration ofan external function:

extern int count_leading_zeros(uint32_t src);   

The declaration of the function is then done in a separate assemblyfile. This looks like:

.global count_leading_zeros
clz r0, r0
bx lr
.type count_leading_zeros, function
.size count_leading_zeros, .-count_leading_zeros

To the C/C++ code, a call to count_leading_zeros() looks just like any other call to an function. Therefore, count_leading_zeros() must behavelike a function.

The remainder of this paper will focus on how to safely write codein assembly files. It should be noted, however, that intrinsic functionsare preferable when an intrinsic function corresponding to the desiredfunctionality is made available to the user.

ARM calling convention
Sinceassembly functions must behave as functions do, the user needs to have abasic knowledge of the ARM calling convention. The calling conventionis well documented in the ARM ABI. However, I shall summarize the ruleshere.

  1. The first four integer, float, and pointer arguments are passed in r0, r1, r2, and r3. Additional arguments are passed at consecutive stack locations [sp,0], [sp,4], [sp,8], etc.
  2. Shorts and chars are passed appropriately zero or signed extended.
  3. Doubles and long longs are passed in aligned locations (r0-r1, r2-r3, or 8-byte aligned stack locations). Doubles are normally even passed this way on systems that support floating point.

For the return values, 32-bit scalar return values are returned in r0.Scalar values is less than 32 bits in size are returned properly sign orzero extended in r0. 64-bit scalar return values are returned in r0 andr1.

Here are some examples:

extern int func1(int a, double b);
// a is in r0. b is in r2-r3. The return value is in r0.

extern int func2(int size, int *ptr);
// size is in r0. ptr is in r1. The return value is in r0.

extern unsigned short func3(void);
// There are no parameters. The return value is in the low 16
// bits of r0. Since the return value is unsigned, the high
// bits are all 0.

extern signed char get_next_char(int device);
// Device is in r0. The return value is in the low 8 bits of
// r0. The high 24 bits of r0 are a sign extension of the
// high bit of the return value since the return value is
// signed.

You must also respect the register save and stack convention of ordinaryfunctions.

R0, R1, R2, R3, and R14 are callee-saved (“temporary”) registers. Afunction may modify them freely, and the caller has no expectation oftheir value upon return other than what was described above for thereturn value.

R4, R5, R6, R7, R8, R9, R10, and R11 are caller-saved (“permanent”)registers. A function must ensure that they hold the same value uponexit as they did upon entry. In practice, these registers are either notused by a function, or if they are used they are saved on the stack inthe prologue and restored from the stack in the epilogue.

R13 is the stack pointer, and it must be kept 8 byte aligned when anew frame is created.

R15 (PC) is the program counter, and it is not usable for generalpurposes.

Since most functions return to their caller, a callee must keeparound the value of the Link Register (R14 or LR) so that it can returnto its caller. In this sense, the callee treats R14 as a permanentregister, although it is not necessary to restore its value back intoR14. Often times, the link register's value is popped back into the PCrather than the LR. On ARM V5 and later architectures, this acts thesame as restoring the value to the LR and then executing BX LR wouldhave.

Here's an example of an assembly function that saves and restoresregisters.

.global func
@ int func(int a, int b, int c, int d, int e)
stmfd sp!, {r4, r5, lr} @ Saves 3 registers, taking
12 bytes from the stack
sub sp, sp, 4 @ Maintain 8 byte stack
mov r4, r0 @ Save "a" in r4
add r5,r2,r3 @r5 =c+d
str r1, [sp,0] @ Save "b" on the stack
bl some_other_finc @ Make a function call
ldr r0, [sp, 0] @ Load "b" into r0
eorr0,r0,r4 @r0 =a^b
and r0,r0,r5 @r0=(a^b)&(c+d)
ldr r1, [sp, 16] @ Load "e" into r1
@ "e" was at [sp,0] upon entry
to the function, after the
@ adjustments it is at [sp, 16]
mulr0,r0,r1 @r0 =((a^b)&(c+d))*e
add sp, sp, 4 @ Prepare to pop registers
ldmfd sp!, {r4, r5, pc} @ Restore r4 and r5 and return
to caller with the return
value in r0
.type func, function
.size func, .-func

The .type and .size directives set the size and type fields for thesymbol in the ELF file corresponding to “func”. Smart linkers that arecapable of optimizations may need this information to properly optimizethe code, and it helps debuggers understand the code, too.

Preprocessed assembly
Preprocessedassembly code is assembly code that is run through the C preprocessorbefore being assembled. This allows the conditional inclusion and macroexpansion capabilities of the C preprocessor to be exploited in theassembly file.

To instruct the GNU toolchain to preprocess an assembly file, givethe file an .sx extension, and use the driver to invoke the assembler.Defines may be passed along as they would to a C/C++ file. For example:

gcc-arm-eabi –DCHECKED –DSIZE=400 –c   

With the Green Hills toolchain, the assembly file may either be given anextension of .arm, or the –preprocess_assembly_files option may be usedto preprocess an assembly file whose extension is .s. For example:

ccarm –DCHECKED –DSIZE=400 asmfile.arm –c   


ccarm –DCHECKED –DSIZE=400 –preprocess_assembly_files 
asmfile.s –c

Assertions are a keyingredient for reliability. They can be used to verify assumptions inyour code.

Static assertions are assertions that must be evaluated atcompile-time by the compiler or assembler. Static assertions may beimplemented in C or C++ as:

#define STATIC_ASSERT(x) extern char __TEMP_ARRAY[ 
((x) != 0) ? 1 : -1]

When this macro is invoked with an argument that is a nonzero compiletime constant, the macro expands to a definition of an array with size1. The compiler accepts this without complaint. When this macro isinvoked with an argument that is a compile time zero, the macro expandsto a definition of an array with size -1. The compiler rejects this.Similarly, when the macro is invoked with an argument that is not acompile time constant, the array is declared with a non-constant sizeand the compiler reject this. The net result is that the invoking the STATIC_ASSERT macro will resultin an error unless it is invoked with a non-zero (true) constant value.

A static assertion may be effected in assembly as:

.macro static_assert value 
.if value == 0
Assertion_failed @ Will not assemble

This may be used to test the value of parameters and assemblervariables. For example:

static_assert VAL < 15  

One key use for a static assertion is to synchronize a structure layoutbetween C and Assembly code. For example, if you needed to access astructure from assembly and load the value of a particular field, youmight do something like this.

#ifndef __LAYOUT_H
#define __LAYOUT_H
#define PAYLOAD_OFF 12
#define CHECKSUM_OFF 28
#endif /* __LAYOUT_H */

#include "layout.h"
struct packet {
struct header hdr;
char payload[16];
uint32_t checksum;
STATIC_ASSERT(offsetof(struct packet, payload) ==
STATIC_ASSERT(offsetof(struct packet, checksum) ==
#include "layout.h"
@ void do_checksum(struct packet *p)
@ Compute the checksum for the given packet
.global do_checksum
@ Compute a hash of the first 28 bytes of the packet using
@ the ARM's powerful DSP instructions. Put this value
@ into r3.
@ This is not shown for brevity.
str r3, [r0, CHECKSUM_OFF] @ put checksum in p->checksum
bx lr
.type do_checksum, function
.size do_checksum, .-do_checksum

(For a better view of the formatting, click on this Listing:)

Click on image to enlarge.

Run-time assertions may also be implemented in assembly. For example,let's say we are implementing a convolution of an input signal and animpulse response. For efficiency reasons, we want the two data arrays toboth be 16-byte aligned. The caller should guarantee that thisrequirement is met before calling the function, but mistakes can bemade. We can ensure that this requirement is upheld by quickly checkingit at the debugging of the convolution routine.

@ convolve(uint32_t *signal, uint32_t *fir, size_t slen,
@ size_t fir_len)
.text .global convolve
#ifdef CHECKED
orr r12, r0, r1 @ r0-r3 are arguments, so use r12 as a
@ scratch register ands r12, r12, 15bne .
@ hang if the pointers are not both
@ 16-byte aligned


The #ifdef allows the assertion to be performed optionally. A userdesiring safety can assemble with –DCHECKED, while a user who wishes toavoid the slight overhead of the check can assemble without it.

Proper technique
Whenassembly language is necessary, it is best to use either intrinsicfunctions or functions in assembly files. Proper technique can minimizethe risk of writing in assembly.

Greg Davis is the director of engineering, compilerdevelopment at Green Hills Software.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.