The hardware (and software) implications of endianness - Embedded.com

The hardware (and software) implications of endianness

Embedded software programmers are familiar with the endianness characteristic of a computing processor, insofar as it refers to how bytes of a data word are ordered within memory.

Taking their name from Jonathan Swift's book Gulliver's Travels, big-endian systems are systems in which the most significant byte of the word is stored in the smallest address given and the least significant byte is stored in the largest. In contrast, little endian systems are those in which the least significant byte is stored in the smallest address.

So, depending on endianness, data types are stored differently in memory, which means there are considerations when accessing individual byte locations of a multi-byte data element in memory. In this article, we will review the concept of endianness from a software standpoint and then look at the implications of endianness for hardware IP block designers and device driver developers when they work with a complex system such as a modern System-on-Chip (SoC).

Today’s SoCs integrate many hardware IP blocks; designers need to be aware of the order of bytes on the byte lanes of connecting buses when transferring data. In a system with several discrete hardware components – such as a host processor and external devices connected to it via a PCI bus, for example – the hardware components may support different endianness modes. Device driver developers need to make the data transfers among these hardware components endianness-proof.

Endianness effect in software
From a software standpoint, the 'endianness effect' comes into play when writing and reading data from memory. To fully understand the implications we need to explain the use of data types in embedded software.

All high-level programming languages support several data types. For example, C supports data types such as char, int, long, float, and so on, each having different memory storage length requirements for data elements of its type. The lengths of data types of C are part standard (such as defined by ANSI C) and part compiler implementation dependent. For example, one compiler may implement char data type to be one byte in length and an int to be two bytes in length. Another compiler may implement char data type to be two bytes in length and an int to be four bytes in length. To avoid confusion on lengths and to ensure portability across C compilers, embedded software programmers often define their own data types that explicitly give the number of bytes for the data type.

For example, the following may be defined:
typedef unsigned int uint32_t;   /* unsigned type that is 32-bits long */
typedef int int32_t;             /* signed type that is 32-bits long */
typedef unsigned short uint16_t; /* unsigned type that is 16-bits long */
typedef short int16_t;           /* signed type that is 16-bits long */
typedef unsigned char uint8_t;   /* unsigned type that is 8-bits long */
typedef char int8_t;             /* signed type that is 8-bits long */

The above definitions are specific to a C compiler/processor that implements int data type as four bytes long, short data type as two bytes long and char data type as one byte long. The software then uses these user defined types rather than C standard types. To make their software portable across different C compilers/processors, embedded programmers isolate these type definitions into a 'port' file. When the software needs to work on a new compiler/processor, only the port file needs to be redone, while the software written using the user defined data types remains unchanged.

To illustrate the 'endianness effect', consider the following section of code:

{
   uint32_t * ptr32;
   uint8_t var8;
   ptr32 = (uint32_t *)malloc(sizeof(uint32_t));

/* Bookmark A */
   *ptr32 = 0x44332211;
/* Bookmark B */

/* Bookmark C */
   var8 = *(uint8_t *)ptr32;
/* Bookmark D */

}

Guess what the value of var8 would be at Bookmark D ? If you said “it depends on whether the system is little endian or big endian”, you could jump ahead to the next section of this article. If you guessed 0x44 or any other value, read on.

Remember that in a little endian system, when an element of multi-byte length data type is written to memory, the least significant byte is stored in the lowest address offset of memory. Whereas, in a big endian system the most significant byte is stored in the lowest address offset of memory.
Let us say in the above code, ptr32 takes the address value 0x80000000 from the malloc. The content of byte addressable locations starting at address 0x80000000 would look as follows at Bookmark B :

From the above layout in memory, it is clear that var8 gets the value 0x11 in a little endian system and value 0x44 in a big endian system.

An easy rule of thumb for remembering the little endian/big endian difference is LLL – Little endian, Least significant byte, Lowest address).

Now consider the following section of code:

{
   uint32_t * ptr32;
   uint16_t var16;
   ptr32 = (uint32_t *)malloc(sizeof(uint32_t));

/* Bookmark E */
   *ptr32 = 0x44332211;
/* Bookmark F */

/* Bookmark G */
   var16 = *(uint16_t *)ptr32;
/* Bookmark H */

}

At Bookmark H, var16 will have value 0x2211 in a little endian system and 0x4433 in a big endian system. When var16 get its value from address 0x80000000 using a 16-bit (two byte) access, the two bytes starting at location 0x80000000 are read out. These turn out to be 0x11 and 0x22 in little endian system and 0x44 and 0x33 in big endian system. Further, in a little endian system, of the two bytes that are read out, the byte stored at the lower address is interpreted as the least significant byte. That is, 0x11 is interpreted as the least significant byte and 0x22 is interpreted as the most significant byte resulting in var16 getting a value of 0x2211 . Likewise, in a big endian system, the byte stored in the lower address is interpreted as the most significant byte.

How do programmers ensure that they do not get surprises when accessing selective bytes of multi-byte length elements? One option would be to write a variant of code for each endianness mode and use a compile time flag and use #ifdef ’s to select the proper variant – but this is a cumbersome approach. A better way around the problem is to keep in mind that software sees the endianness effect only when mixing data types – specifically, when storing a certain byte-length element into memory and reading the same memory as a different byte-length element. So, the solution is that, if a 32-bit element was stored at a memory address, the content at that memory address needs to be read out as 32-bit element only. Once it is read out of memory and is in a CPU register, the required bytes can be extracted from it.

For example, the endianness-proof software variant of the above code section is as follows:

{
   uint32_t * ptr32;
   uint32_t var32;
   uint16_t var16;

   ptr32 = (uint32_t *)malloc(sizeof(uint32_t));

/* Bookmark E */
   *ptr32 = 0x44332211;
/* Bookmark F */

/* Bookmark G */
   var32 = *ptr32; /* Read from memory as same byte length as what was used during writing */
                   /* var32 would now have 0x44332211 in both endianness modes */
   var16 = (var32 & 0xFFFFu); /* var16 would get 0x2211 */
/* Bookmark H */

}

Hardware implications of endianness
Today's complex SoCs have many hardware IP blocks integrated with the CPU core, which all communicate via interconnected buses. Each bus may serve a specific purpose for the hardware IP block, such as obtaining its configuration parameters, obtaining input data for processing, or giving out the output data after processing. A bus may be designed in different widths such as 32-, 64- or 128-bit lines, depending on the transfer bandwidth requirements of its hardware end points. Data transfer is done over a bus between the end points in units called transactions. A transaction encompasses the actual data transferred, as well as the address of data and any clocking/control signalling for synchronizing transmission and reception. A transaction can be a read type or a write type.

The size of data transferred in one transaction is called transaction size. A hardware IP block can use different transaction sizes for different purposes on the same bus. This is a design-time decision, based on different uses of the bus. For example, a hardware IP block may do number crunching operations on 32-bit data units and may obtain four such units in one 128-bit transaction. Apart from data, it may need a periodic but less frequent update of a 32-bit parameter value, which it may obtain using a 32-bit transaction. Of course, transaction size is limited to the width of the bus. Finally, by design, one of the hardware end points of a bus is a master (initiator) of transactions on the bus while the other end point is a slave (target). Regardless of whether a transaction is a read type or a write type, the master initiates the transactions on the bus and the slave just responds to transactions from a master. The above is generally true even with regard to discrete hardware components in a non-SoC system.

So, what are the implications of endianness for hardware IP block designers and device driver developers when transferring data over the buses? Let us assume data is being transferred from memory to an IP block in 64-bit transactions over a 64-bit wide bus. The endianness of the bus determines how the individual bytes of data coming from a specific source address are placed on the physical byte lanes of the bus.

On a little endian bus, the data byte from the lowest source address is placed on the lowest numbered physical byte lane (bit lines 0-7) as follows:

On a big endian bus, the data byte from the highest source address is placed on the lowest numbered physical byte lane (bit lines 0-7) as follows:

Let us say that a hardware IP block does some number crunchingoperations one byte at a time. Say, its input data is in memory ataddress 0x80000000 as follows:

When the eight bytes of data are transferred from source address 0x80000000 in a transaction of size 64 bits, the data is placed differently on little endian and big endian buses:

Since the hardware IP block needs to number crunch the bytes in order – that is, 0x11 followed by 0x22, 0x22 followed by 0x33 andso on, it needs to be aware of the endianness of the bus and extractthe bytes from appropriate byte lanes for processing. On a little endianbus, it will need to extract the bytes from byte lane 0 upwards, whileon a big endian system, it will need to extract the bytes from byte laneseven down.

Now, instead of processing one byte at a time, letus say that the hardware IP block was designed to number crunch 32 bitsas a unit. So, it needs to obtain the first 32-bit data unit to processfirst, followed by the second 32-bit data unit. In this case, it isreasonable to assume that input data was also prepared in 32-bit dataunits (e.g. by software) and written into memory as 32-bit data units.Let us say the software writes input data 0x44332211 and 0x88776655 in that order into memory at address 0x80000000 . The memory snapshot looks as follows for little endian and big endian systems:

Whenthe eight bytes of data are placed on the bus as part of a 64-bittransaction, the data appears as follows for little endian and bigendian buses:

Whenthe hardware IP block extracts the two 32-bit data units on a littleendian bus, it will need to extract the first data unit (0x44332211 ) from the lower four byte lanes (0-3) and the second data unit (0x88776655 )from the higher four byte lanes (4-7). On a big endian bus, it is theother way around. Note that, because the software stored input data inmemory in 32-bit data units, we did not have to worry about the byteorder within each 32-bit data unit. Between the two endianness modes,the byte order within a 32-bit data unit is reversed when storing inmemory and is reversed again when placing on the bus.

Thus, when ahardware block obtains input data or gives out output data on a bus,there are several considerations that hardware IP block designers needto be aware of:

  • What is the unit size of data being processed in the hardware IP block?
  • In case of input data, is the source of data another hardware IP block or memory? In what order and what unit sizes is the input data prepared? Similar considerations apply in case of output data
  • What is the bus transfer transaction size?
  • What is the endianness of the bus?

Letus switch gears and think for a moment that a hardware IP block wasdesigned for a specific endianness, irrespective of the endianness ofthe system. Examples of this situation are discrete hardware devicesconnected on a PCI bus. A PCI bus is defined in specification to belittle endian and devices connected to it therefore expect data toappear little endian on the bus. The host processor to which the PCIdevice is connected may be big endian or little endian. In this case, itis the responsibility of some software component (device driver) orsome hardware component (an intermediate bridge between the host and thePCI device) to take into account the endianness of the host processorand use a specific byte sequence on the bus (reversing the sequence ofbytes if required) while writing or reading the data from the PCIdevice.

This situation is similar to a SoC whose endianness(that of CPU core and bus infrastructure) is configurable to be eitherlittle endian or big endian at boot time, but contains a hardware IPblock with a fixed endianness. Let us say the hardware IP block isdesigned with the assumption that the bus is little endian. In case theSoC is configured as little endian, the sequence of bytes in CPU’smemory will be preserved when transferred to the IP block’s memory. Inthe event that the SoC is configured as big endian, for each transactionthe sequence of bytes in the IP block’s memory will be reversedrelative to the byte sequence in the CPU’s memory. To circumvent thereversal, the sequence of bytes needs to be reversed for eachtransaction over the bus. This may be done by software when preparingthe data in the CPU’s memory or by an intermediate hardware bridgebetween memory and the IP block. In the following example, we illustratehow software can organize the data in CPU’s memory before the (littleendian) IP block reads it. We assume the hardware IP block processesinput data in 8-bit units and obtains the input data in 64-bittransactions.

The byte sequence would now be the same for each transaction on the bus regardless of the endianness of the bus:

Thesame sequence of byte data units on the physical byte lanes of the busserves the hardware IP block well to read out the bytes in the samefashion regardless of the SoC’s boot time endianness configuration.

Notethat if the hardware IP block processes multi-byte data units (and ifsoftware accordingly prepares data in memory in the same multi-byteunits), then the reversing of sequence must happen for each data unitrather than each byte. For example, if the hardware IP block processes32-bit data units and obtains the data in 64-bit transactions, the dataunits must be organized in CPU’s memory (or an intermediate bridge mustcreate such a sequence of data units for each transaction) as follows:
little endian and big endian CPU memory above this line

Itcan be seen that when the above data units are placed on the bus andobtained by the hardware IP block from the bus in each transaction, thesequence of data units will be same regardless of the bus endianness.

Thus,the swapping operation for data during transfer between hardwarecomponents of different endianness modes needs to take into account theunits of data processed. This situation can get complex whentransferring data units of different lengths, such as data structurescontaining fields of different data types. For example, consider thefollowing code section:

{
   typedef struct
   {
      uint8_t elem8_1;
      uint8_t elem8_2;
      uint8_t elem8_3;
      uint8_t elem8_4;
      uint16_t elem16_1;
      uint16_t elem16_2;
      uint32_t elem32;
   } structDef;
   structDef * structVar;
   structVar = (structDef *)malloc(sizeof(structDef));
/* Bookmark A */
   structVar->elem8_1 = 0x11;
   structVar->elem8_2 = 0x22;
   structVar->elem8_3 = 0x33;
   structVar->elem8_4 = 0x44;
   structVar->elem16_1 = 0x6655;
   structVar->elem16_2 = 0x8877;
   structVar->elem32 = 0xCCBBAA99;
/* Bookmark B */

}

At Bookmark A , let us say structVar gets allocated at address 0x80000000 in memory. At Bookmark B , the content of memory at address 0x80000000 would look as follows in big endian and little endian systems:

Note that byte sequence for multi-byte fields is reversed between big endian and little endian systems.

When structVar memory is prepared on a big endian hardware component and transferredto a little endian hardware component or vice versa, the goal ofunderlying transfer logic must be to ensure that structVar content is organized in the destination’s memory conforming to theendianness of that component, as shown in the above picture. If thedestination is little endian, individual bytes of structVar content must be in the sequence shown above for a little endian memoryregardless of the sequence in which they are organized in the source’smemory. To support this, an intimate knowledge of hardware (transactionsize) as well as application context (data structure) is required. Oneoption is for the transfer logic to separate the raw byte sequence fromthe application context. An intermediate hardware bridge or a low leveldevice driver may be used to ensure that the byte sequence in source anddestination memories is the same. To interpret multi-byte data fieldsof a data structure correctly, software (application logic) can do anybyte swapping within the data fields.

Sandeep Yaddula is an applications engineer in the communications infrastructurebusiness of Texas Instruments. (www.ti.com) Prior to his current role,Sandeep was a software design engineer in Voice over Packet technology.He holds a Master's Degree in computer science and engineering fromthe Indian Institute of Technology in Mumbai/India. In his free time,Sandeep pursues certain topics of mathematics to obtain deeper insights.

Acknowledgements
I thank Eldad Falik, a hardware IP design manager at Texas Instruments Inc, for his many clarifications on the topic.

References
On the endian holy wars and a plea for peace
PCI Local Bus Specification (Revision 2.2 – Dec 18, 1998)
Endianness and Addressing
Discussions at pcisig.com

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.