Advertisement

Endianness

October 16, 2015

Colin Walls-October 16, 2015

For most programmers, details of computer architecture are of no interest or importance. Even embedded developers, who normally do concern themselves with details, when programming in a high level language, ignore matters like cache configuration. Factors like memory location and size do matter, when looking at the project as a whole, but even these parameters do not influence day to day coding. The order in which bytes are stored in a word - the endianness - of the CPU in use can often also be ignored. However, once in a while, an appreciation of this matter is critical. The same consideration applies to transmission of data; what order are bytes sent down a serial line or over a network? This article reviews exactly what endianness means and how it affects embedded software.

What is Endianness?
In almost all modern embedded systems, memory is organized into bytes. CPUs, however, process data as 8-, 16- or 32-bit words. As soon as this word size is larger than a byte, a decision needs to be made with regard to how the bytes in a word are stored in memory. There are two obvious options and a number of other variations. The property that describes this byte ordering is called "endianness" (or, sometimes, "endianity").

Broadly speaking, the endianness in use is determined by the CPU. Because there are a number of options, it is unsurprising that different semiconductor vendors have chosen different endianness for their CPUs. The questions, from an embedded software engineer’s perspective, are "Does endianness matter?" and "If so, how much?".

There are broadly two circumstances when a software developer needs to think about endianness:

  • data transmitted over a communications link or network
  • data handled in multiple representations in software

 

The former situation is quite straightforward – it is simply a matter of following or defining a protocol. TCP/IP, for example, does not specify the byte order, but big-endian is the convention. This means that the 32-bit value 0x12345678 would be sent as the sequence 0x12, 0x34, 0x56, 0x78.

The latter is trickier and requires some thought.

Endianness in CPUs
First of all, we need to provide some boundaries for this discussion. Only 32-bit CPUs will be considered, but the same issues apply to 16- and 64-bit devices. Even 8-bit devices typically have instructions that deal with larger data units. The consideration is also limited to the obvious endianness options: least significant byte stored at lowest address ("little-endian") and most significant byte stored at lowest address ("big-endian"). These two options may be visualized quite easily:


There are also other possibilities, like using little-endian within 16-bit words, but storing the 16-bit words inside 32-bit words using big-endian. This is commonly called "middle-endian" or "mixed-endian", but is rarely encountered nowadays. The order of bits within a byte is also potentially arbitrary, but we will ignore that too.

Examples of little-endian CPUs include Intel x86 and Altera Nios II. Big-endian CPUs include Freescale 68K and Coldfire and Xilinx Microblaze. Many modern architectures facilitate both modes and can be switched in software; such "bi-endian" devices include ARM, PowerPC and MIPS.

Consider this code:

unsigned int n = 0x0a0b0c0d;
unsigned char c, d, *p;

c = (unsigned char) n;
p = (unsigned char *) &n;
d = *p;

What values would c and d contain at the end? Whatever the endianness, c should contain the value 0x0d. However, the value of d will depend on the endianness. On a little-endian system d will contain 0x0d; on big-endian it will have the value 0x0a. The same kind of effect would be observed if a union were to be made between n and, say, unsigned char a[4] thus:

union e
{
    unsigned int ui;
    unsigned char a[4];
}f;

f.ui = n;
printf("a[0] = 0x%02x\n", f.a[0]);
printf("a[1] = 0x%02x\n", f.a[1]);
printf("a[2] = 0x%02x\n", f.a[2]);
printf("a[3] = 0x%02x\n", f.a[3]);
This code results in this output on a little-endian machine:
a[0] = 0x0d
a[1] = 0x0c
a[2] = 0x0b
a[3] = 0x0a

Does Endianness Matter?
So, does this matter? With care, most code may be written to be independent of endianness and it may be argued that almost all well-written code would be like this. However, if you do build in an endianness dependency, as usual, good documentation/commenting is obviously essential.

Colin Walls has over thirty years experience in the electronics industry, largely dedicated to embedded software. A frequent presenter at conferences and seminars and author of numerous technical articles and two books on embedded software, Colin is an embedded software technologist with Mentor Embedded [the Mentor Graphics Embedded Software Division], and is based in the UK. His regular blog is located at: http://blogs.mentor.com/colinwalls. He may be reached by email at colin_walls@mentor.com

 

Loading comments...