Getting down to basics: Running Linux on a 32-/64-bit RISC architecture - Part 4The mapping mechanism discussed in Part 3 must allow a program to use a particular address within its own process/address space and translate that efficiently into a real physical address to access memory.
A good way to do this would be to have a table (the page table) containing an entry for each page in the whole virtual address space, with that entry containing the correct physical address.
This is clearly a fairly large data structure and is going to have to be stored in main memory. But there are two big problems:
Problem #1. we now need two references to memory to do any load or store, and that's obviously hopeless for performance. You may foresee the answer to this: We can use a high-speed cache memory to store translation entries and go to the memory-resident table only when we miss in the cache.
Since each cache entry covers 4 KB of memory space, it's plausible that we can get a satisfactorily low miss rate out of a reasonably small cache. (At the time this scheme was invented, memory caches were rare and were sometimes also called "lookaside buffers," so the memory translation cache became a translation lookaside buffer or TLB; the acronym survives.)
Problem #2: The size of the page table. for a 32-bit application address space split into 4-KB pages, there are a million entries, which will take at least 4 MB of memory. We really need to find some way to make the table smaller, or there'll be no memory left to run the programs.
We'll defer any discussion of the solution for this, beyond observing that few real programs use anything like the 4 Gbytes addressable with 32 bits. More modest programs have huge holes in their program address space, and if we can invent some scheme that avoids storing all the "nothing here" translation entries corresponding to the holes, then things are likely to get better.
|Figure 14.2 Desirable memory translation system.|
We've now arrived, in essence, at the memory translation system DEC figured out for its VAX minicomputer, which has been extremely influential in most subsequent architectures. It's summarized in Figure 14.2 above. The sequence of steps in which the hardware works is something like this:
Step #1. A virtual address is split into two, with the least significant bits (usually 12 bits) passing through untranslated—so translation is always done in pages (usually 4 KB).
Step #2. The more significant bits, or VPN, are concatenated with the currently running thread's ASID to form a unique page address.
Step #3. We look in the TLB (translation cache) to see if we have a translation entry for the page. If we do, it gives us the high-order physical address bits and we've got the address to use.
The TLB is a special-purpose store and can match addresses in various useful ways. It may have a global flag bit that tells it to ignore the value of ASID for some entries, so that these TLB entries can be used to map some range of virtual addresses for every thread.
Similarly, the VPN may be stored with some mask bits that cause some parts of the VPN to be excluded from the match, allowing the TLB entry to map a larger range of virtual addresses.
Both of these features are available in MIPS MMUs (there's no variable size pages in some very old MIPS CPUs, though).
Step #4. There are usually extra bits (flags) stored with the PFN that are used to control which kind of access is allowed—most obviously, to permit reads but not writes. We'll discuss the MIPS architecture's flags later in this series.
If there's no matching entry in the TLB, the system must locate or build an appropriate entry (using main-memory-resident page table information) and load it into the TLB and then run the translation process again.
In the VAX minicomputer, this process was controlled by microcode and seemed to the programmer to be completely automatic. If you build the right format of page table in memory and point the hardware at it, all memory translation just works.