Dealing with memory access ordering in complex embedded designs
Things used to be so simple in the embedded world. For most of us, the systems for which we develop these days are orders of magnitude more complex than the ones we were using even five years ago.
As embedded systems chase ever higher performance, processor designers reach deeper and deeper into the toolbox for microarchitectural innovations. Many of these, mercifully, are transparent to the programmer. The challenge for us is that many are not transparent; we to be aware of what is going on and write our software in different ways. In some cases, we are missing out on improved performance, but in many cases, existing software techniques simply won’t work properly unless we take into account some of the new ways in which modern embedded systems function.
The area I address in this article is memory accesses, specifically the order in which they happen. The simple act of loading, storing, and transferring data between processor and memory is much more complex than it used to be. Consider the following simple example:
LDR r0, [r1] ; read UART data input register
STR r4, [r1, #4] ; write control register
STR r0, [r1, #8] ; write to UART data output register
In some imaginary peripheral, we are reading input data, carrying out a control operation, and then writing the data out again through a different port. Presumably, the data sheet tells us that the control operation must be complete before the data is written out. On a simple system, we can guarantee that the memory operations will complete in the order they appear in the program and that each will complete before the next is started.
On today’s systems, that isn’t true any more.
Caches could mean that the LDR is satisfied by a cache access and never accesses the real hardware. Write buffers could mean that the STRs don’t happen in the order in which they are written.
Both effects, coupled with the compiler re-ordering instructions, could mean that the first STR is
executed before the LDR or after the second.
Clearly, we wouldn’t be so dumb as to access memory-mapped peripheral registers via a cache (at least, I hope so!) but write buffers can catch us unawares.
Figure 1 shows what a simple system might look like.
Figure 1: In a single processor system connected to a number of memory devices, none are able to act independently on the system bus, nor talk directly to each other.
A single processor connects via a single system bus to a variety of memory devices. None of these devices are capable of acting independently on the bus (i.e. they do not initiate bus accesses of their own), nor do they talk directly to each other.
Such a system is relatively simple to manage. In the absence of effects within the processor, memory coherency is easy to manage, memory accesses occur in order, and so on. It obeys the “Sequential Execution Model” (SEM). In short, things happen in the order in which you write them.
A more complex system, on the other hand, might look like what is shown in Figure 2.
Figure 2: In a complex system with multiple CPUs linked to multiple component over a multilayer bus matrix, several are able to act autonomously and independently, take control of the bus, and talk to each other.
Here we have multiple processors talking to multiple system components via a multi- layer bus matrix. Amongst the system components are several which are capable of acting autonomously and independently of the processor. They can access data independently and take control of the bus without intervention of the processor. They can also talk directly to each other.
There are many opportunities in such a system for the SEM to break down. There are multiple software threads executing on multiple processors, there are caches and buffers and autonomous memory access units such as DMA controllers.
Analyzing everything going on in such a system is beyond the scope of this article but we will look at the most common behaviors and what you need to do about them in software.
There are two distinct effects which we need to consider:
- Compiler behavior
- System behavior
We need to write our software so that the compiler produces what we expect. But we also need to know how the system behaves so we can ensure that output code actually has the desired effect.