Decompiling the ARM architecture code

At UBM TechInsights, we're often tasked with proving patent infringement of a software algorithm as part of our IP Management Services. An embedded algorithm can range from a sensoring technique in an appliance, to motor control, to power management scheme, to navigation algorithm, to UI control or file system on a higher end embedded device; to name a few examples. Investigating a possible patent infringement is one of the few cases where reverse engineering software is legal in spite of any license agreement to the contrary.

An issue for projects of this nature is that most modern machine code is produced from C or C++, and the process of generating machine code by an optimizing compiler is very sophisticated. Therefore, looking at low-level (machine or assembly language) instructions is a cumbersome and error-prone way of ascertaining infringement.

Decompilation is the process of taking machine language instructions and translating them into a higher-level language representation. Decompilation is more typically used for analysis of computer viruses and malware, and, sometimes to recover lost source code or make a compatible product. One popular example of a decompiler is from Hex-Rays, who sells a very good decompiler for the i386 platform as a plug-in for its IDAPro dissassembler.

Our example for this article is based on one of the most popular assembly languages for high-volume high-value consumer electronics and many other embedded devices-the ARM architecture. We found that available decompilers for ARM produce poor quality code, so we adapted and expanded the open source “Desquirr” decompiler for our needs.

Example

First, let's consider a simple C program:

main (int argc, char **argv)

{

inti;

intsum;

sum = 0;

for (i = 0; i < 1000; i++) {

sum += i;

}

printf (“The sum of 0..999 is %dn”, sum);

}

The program demonstrates assignments, mathematical operations, variables, a comparison operation and conditional flow.

Compilation and output using stock Desquirr

Compiling this program with gcc from arm32 target (with maximum optimization, -O3) yields the following:

main:

MOV R1, #0

MOV R0, #0x3E4

MOV R3, R1

ADD R2, R0, #3

loc_8478:

ADD R1, R1, R3

ADD R3, R3, #1

CMP R3, R2

BLE loc_8478

LDR R0, =aThe SumOf0__999

B .printf

You can see that the for() loop has been implemented as a simple comparison against the terminating condition and a conditional branch, and the variables i and sum have been moved into registers.

Desquirr yields the following:

main:

R1 = 0;

R0 = 0x3e4;

R3 = R1;

R2 = R0 + 3;

loc_8478:

R1 = R1 + R3;

R3 = R3 + 1;

Cond = R3 – R2;

if (Cond <= 0) goto loc_8478;

R0 = “The sum of 0..999 is %dn”;

goto .printf;

While this output is sufficient for a simple case like the above, there are some problems:

• various bugs exist in the interpretation of ARM instructions

• insufficient modelling of the ARM architecture exists (flags, in particular)

• no control flow transformations are done (i.e., output is goto-laden spaghetti code)

The first step in decompilation is to convert machine-dependent assembly language to a machine-independent representation. This is accomplished by the front end processor. We upgraded the original Desquirr front end by fixing opcode interpretation buglets and adding an extensive machine-modeling section (that dealt with flags, operand sizes, etc.).

This is the internal, intermediate output after all machine language statements have been converted to a C-like representation:

#define sgn(__X) ((1 << 31) & (__X))

main:

R1 = 0;

R0 = 0x3e4;

R3 = R1;

R2 = R0 + 3;

loc_8478:

R1 = R1 + R3;

R3 = R3 + 1;

Flag_Z = !(R3 – R2);

Flag_N = !!sgn(R3 – R2);

Flag_C = !(R3 < R2);

Flag_V = sgn(R3) != sgn(R2) && sgn(R3-R2) == sgn(R2);

CCEQ = R3 == R2;

CCCS = R3 >= R2;

CCMI = R3 < R2;

CCVS = Flag_V;

CCHI = R3 > R2;

CCGE = R3 >= R2;

CCGT = R3 > R2;

if (CCGT) goto loc_8488;

goto loc_8478;

loc_8488:

R0 = * (PC + 0);

goto .printf;

Notice that lots of intermediate flag assignments are eliminated via a dead-code elimination pass (aka “define/use expansion” in the literature), yielding:

main:

R1 = 0;

R0 = 0x3e4;

R3 = R1;

R2 = R0 + 3;

loc_8478:

R1 = R1 + R3;

R3 = R3 + 1;

CCGT = R3 > R2;

if (CCGT) goto loc_8488;

goto loc_8478;

loc_8488:

R0 = * (PC + 0);

goto .printf;

How this works is that the control flow analysis (CFA) takes the control flow graph (CFG) of the program as input. The CFG represents the program's control flow-that is to say, it consists of a list of “runs” of instructions (instructions that don't have a change of control flow and which aren't themselves targets of control flow changes) that are connected together into a graph. The edges of the graph represent control flow changes (as a results of control transfer instructions, like procedure calls, jumps and conditional jumps). The output of the CFA stage is a number of graphs matching standard control patterns (loops, if/then statements, switches etc.). From that, we can reconstruct the program's high-level representation.

To process the CFA graphs, we use interval-based analysis. We start with an iterative procedure of discovery to find the graph intervals. A graph interval is a subgraph, containing a special vertex (interval header) and in which all loops (a sequence of control transfers from vertex to vertex starting and ending with the same vertex) contain the header. Collapsing each interval into a vertex and considering two vertices connected if some vertices of the original graph taken from the intervals are connected, we obtain a derived graph. Repeating the process of producing derived graphs recursively, we can then obtain a sequence of derived graphs. In this way we can easily identify nested loops in the output program-they correspond to loops located in sequentially produced derived graphs of different levels.

CFA includes a number of algorithms for detailed control flow structuring. For example, it's not enough to find a loop location-we have to identify the loop bounds, body, the type of the loop (while, for, do-while, etc.), follow vertices and so on. This is typical output at this stage:

main:

R1 = 0;

R0 = 0x3e4;

R3 = R1;

R2 = R0 + 3;

{//BEGIN loop 0

loc_8478:

R1 = R1 + R3;

R3 = R3 + 1;

CCGT = R3 > R2;

if (CCGT) goto loc_8488;

goto loc_8478;

}//END loop 0

loc_8488:

R0 = * (PC + 0);

goto _printf;

Notice that the main loop has been identified with comments.

At the last step, code generation uses information collected during the CFA stage and translates the control structures discovered in the CFG into high-level constructs of the output program. Most of the CFA implementation is based on well-known algorithmsfalse true 1,2 using the Lengauer-Tarjan dominator tree algorithm from the Boost graph library.

Eventually, one can get:

main:

R3 = R1 = 0;

while (1) {

R1 += R3;

if (++R3 > 999) break;

}

return (printf (*PC));

which is quite close to the original.

Why isn't it the same?

A decompiler isn't a silver bullet. Beyond the technical limits of our current decompiler, such as detection of for() loops or handling of functions with variable arguments, there are fundamental issues:

• variable names (like i and sum) are lost-they are a convenience used by the programmer but removed by the compiler. It's easy to see in the above code that R1 is sum and R3 is i.

• Macro, template, and inline function expansions are mostly lost before the actual code-generation phase of compilation begins, so they aren't visible in the assembly language anyway,

• Most facets of object-oriented code suffer a similar fate. For example, decompilation is hampered by the use of multiple levels of indirection for member functions.

References

1. C Cifuentes, Reverse Compilation Techniques, PhD thesis, Faculty of Information Technology, Queensland University of Technology, July 1994.

2. C Cifuentes, D Simon and A Fraboulet, Assembly to High-Level Language Translation.† Proceedings of the International Conference on Software Maintenance, Washington DC, 18-20 Nov 1998, IEEE Press, pp 228-237.


Dr. Sourjko (left) is a senior software developer of technology and product development at UBM TechInsights. He holds doctoral degrees in applied math and computer science. Robert Krten is a senior analyst with UBM TechInsights where he reverse engineers software and maps the derived architecture against claim elements of customers' patents.

1 thought on “Decompiling the ARM architecture code

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.