The mapping mechanism discussed in
Part
3 must allow a program to use a particular address within
its
own process/address space and translate that efficiently into a real
physical address to access memory.
A good way to do this would be to have a table (the page table)
containing an entry for each page in the whole virtual address space,
with that entry containing the correct physical address.
This is clearly a fairly large data structure and is going to have
to be stored in main memory. But there are two big problems:
Problem #1. we now need two references to memory to do any load or
store, and that's obviously hopeless for performance. You may foresee
the answer to this: We can use a high-speed cache memory to store
translation entries and go to the memory-resident table only when we
miss in the cache.
Since each cache entry covers 4 KB of memory space, it's plausible
that we can get a satisfactorily low miss rate out of a reasonably
small cache. (At the time this scheme
was invented, memory caches were rare and were sometimes also called
"lookaside buffers," so the memory translation cache became a
translation lookaside buffer or TLB; the acronym survives.)
Problem #2: The size of the page table. for a 32-bit application
address space split into 4-KB pages, there are a million entries, which
will take at least 4 MB of memory. We really need to find some way to
make the table smaller, or there'll be no memory left to run the
programs.
We'll defer any discussion of the solution for this, beyond
observing that few real programs use anything like the 4 Gbytes
addressable with 32 bits. More modest programs have huge holes in their
program address space, and if we can invent some scheme that avoids
storing all the "nothing here" translation entries corresponding to the
holes, then things are likely to get better.
 |
| Figure
14.2 Desirable memory translation system. |
We've now arrived, in essence, at the memory translation system DEC
figured out for its VAX minicomputer, which has been extremely
influential in most subsequent architectures. It's summarized in Figure 14.2 above. The sequence of
steps in which the hardware works is something like this:
Step #1. A virtual address
is split into two, with the least significant bits (usually 12 bits)
passing through untranslated—so translation is always done in pages
(usually 4 KB).
Step #2. The more
significant bits, or VPN, are concatenated with the currently running
thread's ASID to form a unique page address.
Step #3. We look in the TLB
(translation cache) to see if we have a translation entry for the page.
If we do, it gives us the high-order physical address bits and we've
got the address to use.
The TLB is a special-purpose store and can match addresses in
various useful ways. It may have a global flag bit that tells it to
ignore the value of ASID for some entries, so that these TLB entries
can be used to map some range of virtual addresses for every thread.
Similarly, the VPN may be stored with some mask bits that cause some
parts of the VPN to be excluded from the match, allowing the TLB entry
to map a larger range of virtual addresses.
Both of these features are available in MIPS MMUs (there's no variable size pages in some
very old MIPS CPUs, though).
Step #4. There are usually
extra bits (flags) stored with the PFN that are used to control which
kind of access is allowed—most obviously, to permit reads but not
writes. We'll discuss the MIPS architecture's flags later in this
series.
If there's no matching entry in the TLB, the system must locate or
build an appropriate entry (using main-memory-resident page table
information) and load it into the TLB and then run the translation
process again.
In the VAX minicomputer, this process was controlled by microcode
and seemed to the programmer to be completely automatic. If you build
the right format of page table in memory and point the hardware at it,
all memory translation just works.