Getting down to basics: Running Linux on a 32-/64-bit RISC architecture - Part 5
MIPS Specific Issues in the Linux Kernel
By Dominic Sweetman
Embedded.com
(06/21/08, 12:05:00 AM EDT)
Much of the Linux kernel is written in portable C, and a great deal of it is portable to a clean architecture like MIPS with no further trouble. In the previous Part 4 we looked at the obvious machine-dependent code around exceptions and memory management.

This and the next part in this series will look at the other places where MIPS-specific code is needed. We will deal first with cases where most MIPS CPUs have traded off programming convenience for hardware simplicity: first, that MIPS caches often require software management and, second, that the MIPS CP0 (CPU control) operations sometimes require explicit care with pipeline effects.

We'll also take a quick look at what you need to know about MIPS for a symmetric multiprocessor (SMP) Linux system. And lastly is a glimpse at the use of heroic assembly code to speed up a heavily used kernel routine.

Explicit Cache Management
In x86 CPUs, where Linux was born and grew up, the caches are mostly invisible, with hardware keeping everything just as if you were talking directly to memory.

Not so MIPS systems, where many MIPS cores have caches with no extra "coherence" hardware of any kind. Linux systems must deal with troubles in several areas.

DMA Device Accesses
DMA controllers write memory (leaving cache contents out-of-date) or read it (perhaps missing cached data not yet written back). On some systems - particularly x86 PCs - the DMA controllers find some way to tell the hardware cache controller about their transfers, and the cache controller automatically invalidates or writes back cache contents as required to make the whole process transparent, just as though the CPU was reading and writing raw memory.

Such a system is called "I/O-cache coherent" or more often just "I/O coherent." Few MIPS systems are I/O-cache coherent. In most cases, a DMA transfer will take place without any notification to the cache logic, and the device driver software must manage the caches to make sure that no stale data in cache or memory is used.

Linux has a DMA API that exports routines to device drivers that manage DMA data flow (many of the routines become null in an I/O coherent system). You can read about it in the documentation provided with the Linux kernel sources, which includes Documentation/DMA-API.txt.

In fact, if you're writing or porting a device driver, you should read that. When a driver asks to allocate a buffer, it can choose:

"Consistent" memory: Linux guarantees that "consistent" memory is I/O coherent, possibly at some cost to performance. On a MIPS CPU this is likely to be uncached, and the cost to performance is considerable.

But consistent buffers are the best way to handle small memory-resident control structures for complex device controllers.

Using nonconsistent memory for buffers: Since consistent memory will be uncached for many MIPS systems, it can lead to very poor performance to use it for large DMA buffers.

So for most regular DMA, the API offers calls with names like dma map xx(). They provide buffers suitable for DMA, but the buffers won't be I/O coherent unless the system makes univeral coherence cheap.

The kernel memory allocator makes sure the buffer is in amemory region that DMA can reach, segregates different buffers so they don't share the same cache lines, and provides you with an address in a form usable by the DMA controller.

Since this is not coherent, there are calls that operate on the buffer and do the necessary cache invalidation or write-back operations before or after DMA: They are called dma sync xx(), and the API includes instructions on when and how to call these functions.

For genuinely coherent hardware, the "sync" functions are null. The language of the API documentation is unfortunate here. There is a little-used extension to the API whose function names contain the word "noncoherent," but you should not use it unless your system is really strange.

A regular MIPS system, even though it is not I/O coherent, can and should work fine with drivers using the standard API.

This is all moderately straightforward by OS standards. But many driver developers are working on machines that manage this in hardware, where the "sync" functions are just stubs. If they forget to call the right sync function at the right moment, their software will work: It will work until you port it to a MIPS machine requiring explicit cache management.

So be cautious when taking driver code from elsewhere. The need to make porting more trouble free is the most persuasive argument for adding some level of hardware cache management in future CPUs.

Writing Instructions for Later Execution
A program that writes instructions for itself can leave the instructions in the D-cache but not in memory, or can leave stale data in the I-cache where the instructions ought to be.

This is not a kernel-specific problem: In fact, it's more likely to be met in applications such as the "just-in-time" translators used to speed up language interpreters. It's beyond the scope of this book to discuss how you might fix this portably, but any fix for MIPS will be built on the synci instruction.

That's the ideal: synci was only defined in 2003 with the second revision of the MIPS32/64 specifications, and many CPUs without the instruction are still in use.

On such CPUs there must be a special system call to do the necessary D-cache write-back and I-cache invalidation using privileged cache instructions.

Cache/Memory Mapping Problems
Virtual caches (real ones with virtual index and tagging) seem a wonderful free ride, since the whole cache search process can start earlier and run in parallel with page-based address translation.

A plain virtual cache must be emptied out whenever there's a memory map change, which is intolerable unless the cache is very small. But if you use the ASID to extend the virtual address, entries from different processes are disambiguated.

OS programmers know why virtual caches are a bad idea: The trouble with virtual caches is that the data in the cache can survive a change to the page tables.

In general, the virtual cache ought to be checked after any mapping is rescinded. That's costly, so OS engineers try to minimize updates, miss some corner case, and end up with bugs.

In a heroic attempt to make Linux work successfully even with virtual caches, the kernel provides a set of rules and function calls that should be provided as part of the port to an architecture with troublesome caches.

They're the functions with names starting flush cache xxx() described in the kernel documentation (Documentation/cachetlb.txt.) I don't like the word "flush" to describe cache operations: It's been used to mean too many things. So note carefully that in the Linux kernel a "cache flush" is something you do to get rid of cache entries that relate to obsolete memory mappings.

In a system where all caches are physically indexed and tagged, none of these calls needs to do anything.

Fortunately, virtual D-caches are rare on MIPS CPUs. Some recent CPUs have virtual I-caches: Implement the "flush" functions as described in the documentation and you should be all right.

But L1 caches with physical tags but virtual indexes are common on MIPS CPUs. They solve the problems described in this section, but they lead to a different problem called a "cache alias"(read on).

Cache Aliases
We're now getting to something more pernicious. MIPS CPU designers were among the first to realize that the benefits of using the virtual address to index their cache could be combined with the benefit of using the physical address to tag it. This can lead to cache aliases.

The R4000 CPU was the first to use virtually indexed caches. As originally conceived, the CPU always came with an L2 cache (the cache memory was off chip, but the L2 controller is included with the CPU), and it used the L2 cache to detect L1-cache aliases. If you loaded an alias to a line that was already present in the L1, the CPU generated an exception, which could be used to clean up.

But the temptation to produce a smaller, cheaper R4000 variant by omitting the L2 cache memory chips and the pins that wired them up proved too strong. Contemporary UNIX systems had a fairly stylized way of using virtual memory, which meant that you could control memory allocation to avoid ever loading an alias.

In retrospect we can see that generating aliases is a bug, and the careful memory management was a workaround for it. But it worked, and people forgot, and it became a feature.

There are basically two ways to deal with cache aliases. The first is to try to ensure that whenever a page is shared, all the virtual references to it have the same "page color" (that means that the references may be different, but the difference between them is a multiple of the cache set size).

Any data visible twice in same-color pages will be stored at the same cache index and handled correctly. It's possible to ensure that all user-space mappings of a page are of the same color.

But unlike the old BSD systems, Linux provides features where correct page coloring is impossible. Those will be cases where you have both a user-space and kernel mapping to the same page (in many cases, on a MIPS kernel, the kernel "mapping" will be a kseg0 address). So the MIPS port has special code to detect those cases and clean out any old alias mappings.

TheCache/TLB documentation (that's Documentation/cachetlb.txt, as mentioned in the section above) makes a heroic attempt to deal with cache aliases as "just another symptom" of virtual caches in general. It provides some notes on how to configure the kernel to do what it can on page coloring and how to handle kernel/user-space aliases.

<>Next in Part 6: CP0 pipeline hazards, multiprocessors and coherent caches.
To read Part 4, go to "What we really want".
To read Part 3, go to "What Happens on a System Call"
To read Part 2, go to "How hardware and software work together."
To read Part 1, go to "GNU/Linux from eight miles high"

This series of articles is based on material from "See MIPS Run Linux," by Dominic Sweetman, used with the permission of the publisher, Morgan Kaufmann/Elsevier, which retains full copyrights. It can be purchased on line.

Dominic Sweetman is a software/hardware boundary expert based in London, England, who previously served as managing director at Algorithmics Ltd.