CMP EMBEDDED.COM

Login | Register     Welcome Guest  
HOME DESIGN PRODUCTS COLUMNS E-LEARNING CONFERENCES CODE FORUMS/BLOGS NEWSLETTERS CONTACT FEATURES RSS RSS

Getting down to basics: Running Linux on a 32-/64-bit RISC architecture - Part 4
What We Really Want



Embedded.com

Origins of the MIPS Design
The MIPS designers wanted to figure out a way to offer the same facilities as the VAX with as little hardware as possible. The microcoded TLB refill was not acceptable, so they took the brave step of consigning this part of the job to software.

That means that apart from a register to hold the current ASID, the MMU hardware is simply a high-speed, fixed-size table of translations. System software can (and usually does) use the hardware as a cache of entries from some kind of comprehensive memory-resident page table, so it makes sense to call the hardware table a TLB.

But there's nothing in the TLB hardware to make it a cache, except this: When presented with an address it can't translate, the TLB triggers a special exception (TLB refill) to invoke the software routine. Some care is taken with the details of the TLB design, the associated control registers, and the refill exception to help the software to be efficient.

The MIPS TLB has always been implemented on chip. The memory translation step is required even for cached references, so it's very much on the critical path of the machine. That meant it had to be small, particularly in the early days, so it makes up for its small size by being clever.

It's basically a genuine associative memory. Each entry in an associative memory consists of a key field and a data field; you present the key and the hardware returns the data of any entry the key matches. Associative memories are wonderful, but they are expensive in hardware. MIPS TLBs have had between 32 and 64 entries; a store of this size is manageable as a silicon design.

All contemporary CPUs use a TLB in which each entry is doubled up to map two consecutive VPNs to independently specified physical pages. The paired entries double the amount of memory that can be mapped by the TLB with only a little extra logic, without requiring any large-scale rethinking of TLB management.

You will see the TLB referred to as being fully associative; this emphasizes that all keys are really compared with the input value in parallel. (The common 32-entry paired TLB would be correctly, if pedantically, described as a 32-way set-associative store, with two entries per set.)

Figure 14.3 TLB entry fields.

The TLB entry is shown schematically in Figure 14.3 above. For the moment, we'll assume that pages are 4 Kbytes in size. The TLB's key - the input value - consists of three fields:

Field #1: VPN2 - The page number is just the high-order bits of the virtual address - the bits left when you take out the 12 low bits that address the byte within page. The "2" in VPN2 emphasizes that each virtual entry maps 8 Kbytes because of the doubled output field. Bit 12 of the virtual address selects either the first or the second physical-side entry of the pair.

Field #2: PageMask - Controls how much of the virtual address is compared with the VPN and how much is passed through to the physical address; a match on fewer bits maps a larger region. A "1" bit causes the corresponding address bit to be ignored. Some MIPS CPUs can be set up to map as much as 16 MB with a single entry. The most significant ignored bit is used to select the even or odd entry.

Field #3: ASID - Marks the translation as belonging to a particular address space, so this entry will only be matched if the thread presenting the address has EntryHi(ASID) set equal to this value.

The G bit, if set, disables the ASID match, making the translation entry apply to all address spaces (so this part of the address map is shared between all spaces). The ASID is 8 bits: The OS-aware reader will appreciate that even 256 is too small an upper limit for the number of simultaneously active processes on a big UNIX system.

However, it's a reasonable limit so long as "active" in this context is given the special meaning of "may have translation entries in the TLB." OS software has to recycle ASIDs where necessary, which will involve purging the TLB of translation entries for any processes being downgraded from active.

It's a dirty business, but so is quite a lot of what OSs have to do; and 256 entries should be enough to make sure it doesn't have to be done so often as to constitute a performance problem.

For programming purposes, it's easiest if the G bit is kept in the kernel's page tables with the output-side fields. But when you're translating, it belongs to the input side. On MIPS32/64 CPUs, the two output-side values are AND-ed together to produce the value that is used, but the realistic outcome is that you must make sure the G bit is set the same in both halves.

The TLB's output side gives you the physical frame number and a small but sufficient bunch of flags:

Flag #1 - Physical frame number (PFN): This is the physical address with the low bits cut off (the low 12 bits if this is representing a 4-Kbyte page). Write control bit (D): Set 1 to allow stores to this page to happen. The "D" comes from this being called the dirty bit; see the next section for why.

Flag #2 - Valid bit (V): If this is 0, the entry is unusable. This seems pretty pointless: Why have a record loaded into the TLB if you don't want the translation to work? There are two reasons. The first is that the entry translates a pair of virtual pages, and maybe only one of them ought to be there.

The other is that the software routine that refills the TLB is optimized for speed and doesn't want to check for special cases. When some further processing is needed before a program can use a page referred to by the memory-held table, the memory-held entry can be left marked invalid.

After TLB refill, this will cause a different kind of trap, invoking special processing without having to put a test in every software refill event.

Flag # 3 - Cache control (C): This 3-bit field's primary purpose is to distinguish cacheable (3) from uncached (2) regions.

But that leaves six other values, used for two somewhat incompatible purposes: In shared-memory multiprocessor systems, different values are used to hint whether the memory is shared (when hardware will have to work hard to keep any cached data consistent across the whole machine). In "embedded" CPUs, different values select different local cache management strategies: write-through versus write-back, for example.

1 | 2 | 3 | 4 | 5

Rate this article: Low High
Current rating
  • .
Embedded.com Career Center
Looking for a new job?
SEARCH JOBS

Browse all jobs

SPONSOR
RECENT JOB POSTINGS





 :