Hardware (and software) implications of Endianness in SoC design

Sandeep Yaddula, Texas Instruments

March 17, 2013

Sandeep Yaddula, Texas InstrumentsMarch 17, 2013

Embedded software programmers are familiar with the endianness characteristic of a computing processor. They are generally aware how, depending on endianness, different data types are stored in memory and the consequences of accessing individual byte locations of a multi-byte data element in memory.

In this article, we will review the concept of endianness from a software standpoint and will then discuss the implications of endianness for designers of hardware IP blocks and developers of device drivers when they work with a complex system such as today’s SOC. 

Today’s SOCs integrate many hardware IP blocks and designers need to be aware of the order of bytes on the byte lanes of connecting buses when transferring data. In a system with several discrete hardware components such as a host processor and external devices connected to it via, say, a PCI bus, the hardware components may support different endianness modes and device driver developers need to make the data transfers among these hardware components endianness-proof.

Endianness effect in software
From a software standpoint, “endianness effect” comes into play when writing and reading data from memory. Here, we digress a little to explain the use of data types in embedded software. All high-level programming languages support several data types. For example, C language supports data types such as char, int, long, float and so on. The data types differ from one another in their meaning (float data type is for representing floating point numbers, while int data type is for representing integers) as well as lengths in bytes of memory required to store elements of the data types. 

The lengths of data types of C are part standard (such as defined by ANSI C) and part compiler implementation dependent. For example, one compiler may implement char data type to be one byte in length and an int to be two bytes in length. Another compiler may implement char data type to be two bytes in length and an int to be four bytes in length. To avoid confusion on lengths and to ensure portability across C compilers, embedded software programmers often define their own data types that explicitly give the number of bytes for the data type. 

For example, the following may be defined:

typedef unsigned long uint32_t; /* unsigned type that is 32-bits long */
typedef long int32_t;           /* signed type that is 32-bits long */
typedef unsigned int uint16_t;  /* unsigned type that is 16-bits long */
typedef int int16_t;            /* signed type that is 16-bits long */
typedef unsigned char uint8_t;  /* unsigned type that is 8-bits long */
typedef char int8_t;            /* signed type that is 8-bits long */

The above definitions are specific to a C compiler/processor that implements long data type as 4 bytes long, int data type as 2 bytes long and char data type as 1 byte long. The software then uses these user defined types rather than C standard types. To make their software portable across different C compilers/processors, embedded programmers isolate these type definitions into a "port" file. 

When the software needs to work on a new compiler/processor, only the port file needs to be redone, while the software written using the user defined data types remains unchanged. For example, for a compiler that implements int data type as 4 bytes long, short data type as 2 bytes long and char data type as 1 byte long, the “port” file will have the following type definitions:

typedef unsigned int uint32_t;   /* unsigned type that is 32-bits long */
typedef int int32_t;             /* signed type that is 32-bits long */
typedef unsigned short uint16_t; /* unsigned type that is 16-bits long */
typedef short int16_t;           /* signed type that is 16-bits long */
typedef unsigned char uint8_t;   /* unsigned type that is 8-bits long */
typedef char int8_t;             /* signed type that is 8-bits long */



To illustrate "endianness effect", consider the following section of code:

{
  uint32_t * ptr32;
  uint8_t  var8;
  ptr32 = (uint32_t  *)malloc(sizeof(uint32_t));
...
/* Bookmark A */
  *ptr32 = 0x44332211;
/* Bookmark B */
...
/* Bookmark C */
  var8 = *(uint8_t  *)ptr32;
/* Bookmark D */
...
}


Guess what the value of var8 would be at Bookmark D? If you said "it depends on whether the system is little endian or big endian", you can jump ahead to the next section of this article. If you guessed 0x44 or any other value, read on.

var8 takes 0x44 value in a big endian system, while it takes 0x11 value in a little endian system.

Let us say in the above code, ptr32 takes the address value 0x80000000 from the malloc. The content of byte addressable locations starting at address 0x80000000 would look as follows at Bookmark B:



Figure 1 

That is, in a little endian system, when an element of multi-byte length data type is written to memory, the least significant byte is stored in the lowest address offset of memory. In a big endian system, the most significant byte is stored in the lowest address offset of memory. (My rule of thumb for remembering the little endian/big endian difference is LLL - Little endian, Least significant byte, Lowest address)

From the above layout in memory, it is clear that var8 gets the value 0x11 in a little endian system and value 0x44 in a big endian system.

Now consider the following section of code:

     uint32_t * ptr32;
  uint16_t  var16;
  ptr32 = (uint32_t  *)malloc(sizeof(uint32_t));
...
/* Bookmark E */
  *ptr32 = 0x44332211;
/* Bookmark F */
...
/* Bookmark G */
  var16 = *(uint16_t  *)ptr32;
/* Bookmark H */
...

At Bookmark H, var16 will have value 0x2211 in a little endian system and 0x4433 in a big endian system. This may seem a little tricky at first, but is consistent with the definition of little endian and big endian memory layout. When var16 get its value from address 0x80000000 using a 16-bit (2 byte) access, the two bytes starting at location 0x80000000 are read out. These turn out to be 0x11 and 0x22 in little endian system and 0x44 and 0x33 in big endian system. 

Further, in a little endian system, of the two bytes that are read out, the byte stored at the lower address is interpreted as the least significant byte. That is, 0x11 is interpreted as the least significant byte and 0x22 is interpreted as the most significant byte resulting in var16 getting a value of 0x2211. Likewise, in a big endian system, the byte stored in the lower address is interpreted as the most significant byte.

Dealing with endianness 
How do programmers ensure that they do not get surprises when accessing selective bytes of multi-byte length elements? Do they have to write a variant of code for each endianness mode using a compile time flag and use #ifdef’s to select the proper variant? That would be cumbersome. 

The way around it is to keep in mind that software sees the endianness effect only when mixing data types - specifically, when storing a certain byte-length element into memory and reading the same memory as a different byte-length element. So, the solution is that if a 32-bit element was stored at a memory address, the content at that memory address needs to be read out as 32-bit element only. Once it is read out of memory and is in a CPU register, the required bytes can be extracted from it.

The endianness-proof software variant of the above code section is:

{
     uint32_t * ptr32;
  uint32_t  var32;
  uint16_t  var16;      

  ptr32 = (uint32_t  *)malloc(sizeof(uint32_t));
...
/* Bookmark E */
  *ptr32 = 0x44332211;
/* Bookmark F */
...
/* Bookmark G */
  var32 = *ptr32; /* Read from memory as same byte length as what was used during writing */
                  /* var32 would now have 0x44332211 in both endianness modes */
  var16 = (var32 &  0xFFFFu); /* var16 would get 0x2211 */
/* Bookmark H */
...
}  

Hardware implications of endianness
Today's complex SOCs have many hardware IP blocks integrated with the CPU core. The IP blocks communicate with one another and to the CPU core via buses connected to them. Each of the hardware IP blocks will have one or more buses connected to it. 

Each bus may serve a specific purpose for the hardware IP block such as obtaining its configuration parameters, obtaining input data for processing or giving out the output data after processing. A bus may be designed in different widths such as 32 bit lines, 64 bit lines or 128 bit lines depending on the transfer bandwidth requirements of its hardware end points. Data transfer is done over a bus between the end points in units called transactions. 

A transaction encompasses actual data transferred, and also the address of data and any clocking/control signaling for synchronizing transmission and reception. A transaction can be a read type or a write type. The size of data transferred in one transaction is called transaction size. 

A hardware IP block can use different transaction sizes for different purposes on a single bus. This is based on a design time decision based on different uses of the bus. For example, a hardware IP block may do number crunching operations on 32-bit data units and may obtain four such units in one 128-bit transaction. 
Apart from data, it may need a periodic but less frequent update of a 32-bit parameter value which it may obtain using a 32-bit transaction. A transaction size can be at most the width of the bus.

For example, on a 128-bit wide bus, transaction sizes of 32-bit, 64-bit or 128-bit could be designed. Finally, by design, one of the hardware end points of a bus is a master (initiator) of transactions on the bus while the other end point is a slave (target). If a hardware IP block is the master on a bus, then it initiates the transactions on the bus to read or write the data to a target. If it is slave, then it just responds to transactions from a master. The above is generally true even with regard to discrete hardware components in a non-SOC system.

What are the endianness considerations for the hardware IP block designers and device driver developers when transferring data over the buses? Let us assume data is being transferred from memory to an IP block in transactions of size 64 bits over a 64-bit wide bus. The endianness of the bus determines how the individual bytes of data starting from a specific source address are placed on the physical byte lanes of the bus.

On a little endian bus, the data byte from the lowest source address is placed on the lowest numbered physical byte lane (bit lines 0-7):



Figure 2

On a big endian bus, the data byte from the highest source address is placed on the lowest numbered physical byte lane (bit lines 0-7):



Figure 3 

Let us say, a hardware IP block does some number crunching operations one byte at a time. Say, its input data is in memory at address 0x80000000 as follows:



Figure 4 

When the 8 bytes of data are transferred from source address 0x80000000 in a transaction of size 64 bits, the data is placed differently on little endian and big endian buses:




Click on image to enlarge.
Figure 5 

Since the hardware IP block needs to number crunch the bytes in order – that is, 0x11 followed by 0x220x22 followed by 0x33 and so on, it needs to be aware of the endianness of the bus and extract the bytes from appropriate byte lanes for processing. On a little endian bus, it will need to extract the first byte from byte lane 0, the second byte from byte lane 1, the third byte from byte lane 2 and so on. On a big endian system, it will need to extract the first byte from byte lane 7, the second byte from byte lane 6, the third byte from byte lane 5 and so on.


< Previous
Page 1 of 2
Next >

Loading comments...

Parts Search Datasheets.com

KNOWLEDGE CENTER