Helping underprivileged codePrivilege protection does in hardware what most operating systems and kernels do in softwareit keeps errant and malicious code under control.
A few months ago we covered the exotic new style of memory segmentation that all modern x86 processors use ("Taming the x86 Beast," April 2004). This month, we'll dive even deeper into the murky realm of x86 programming to uncover more riches: the chip's built-in privilege-protection mechanism.
Privilege protection does in hardware what most operating systems and kernels do in software. It protects different tasks from each other and allows only the "correct" tasks to access sensitive data, disable interrupts, modify operating-system tables, and other important tasks. Like segmentation, the x86's privilege protection is tricky to set up but pays off handsomely in run-time reliability and security. I've never seen another processor that does so much to help the operating system and keep recalcitrant code in line.
Most processors have something like a user/supervisor mode. Supervisor mode can do anything, while user mode is more restrictive. For example, only code running in supervisor mode can execute a reset instruction or juggle status bits in the processor's internal registers. The '386, '486, and Pentium family of processors (including AMD's Athlon and Opteron equivalents) have a far more elaborate system of protection. Every scrap of code, data, and stack is assigned one of four privilege levels, and a sophisticated set of rules governs which parts can access which other parts. These chips do so much automatic privilege checking and protection that you can get rid of many simple software kernels and use just the processor's built-in functions.
When the going gets weird
The first concept to understand is that privilege levels aren't modes. They aren't something stored in the processor; they're an attribute of memory. All memory segments get assigned a privilege level, including code, data, and stack segments. For code segments, the privilege level of the segment determines the privilege level of all the code within that segment. A single code segment might contain several separate functions or entire programs; all run at the same privilege level. Conversely, you might want to segregate different programs or subroutines into different code segments to assign them different privilege levels.
Where's that privilege level come from? Bits 45 and 46 of every segment descriptor hold the privilege-level attribute for that segment. We'll call these two bits the descriptor privilege level (DPL) field.
Programs can access data from a segment with equal or lesser privilege than their own. Attempting to reference more-privileged data generates a General Protection Fault (exception 13)an event familiar to many Windows users and customarily accompanied by the fabled Blue Screen of Death.
Programs can jump or branch only to code with exactly the same privilege level; you're not allowed to call higher- or lower-level code without passing through x86 Customs and having the Intel Border Patrol check your authorization. More on that anon.
So two bits somewhere in RAM set the privilege level of the code that's running and have power over what other code that code can call and what data it can access. Protect these two bits carefully.
This setup has a few interesting implications. First, you can't change the privilege level of running code. Its privilege is determined entirely by the two DPL bits of the memory segment it lives in, not by anything tweakable inside the processor. You could change the privilege level of nonrunning code, but you'd have to have access to its segment descriptor to do that. This scheme also implies that code can't have different privileges for different users or at different times. Privilege is a static characteristic, fixed by the segment where the code lives.
Don't touch me there
The whole point of privilege protection is to keep errant or malicious code from touching memory or I/O it shouldn't. The x86 processor protects data by performing two checks: one when you change a segment register and one when you actually try to access the data. The first test checks your current privilege level (that is, the DPL bits of the code segment from which you're running) against the data's privilege level (the DPL bits of the code, data, or stack segment whose descriptor index you've just loaded). If your code's privilege level is lower than the data's, the processor generates a General Protection Fault and refuses to change the segment register. This test happens every time you change the DS, ES, FS, GS, or SS registers.
The second test occurs when you access the data segmentin other words, on every single read and write operation. The processor checks to be sure you're not writing something where you shouldn't, such as into a write-only data segment or into any code segment.
The same procedure, more or less, prevents code from calling other code it shouldn't. The rules for code are more restrictive than the rules for data, though. You can only jump or branch to code that's exactly the same privilege level as the code that's running. For code in the same segment (a near call, jump, or branch) that's no problem. For code in another segment (a far jump, call, or return) it's another matter.
The obvious question now is, how do you change privilege levels? There are two ways: one is trivially easy, the other insanely complex. But you've come to expect that by now, haven't you?
The easy way is to define the target code segment as "privilege-less" by setting bit 42 of its descriptor. This subtle act identifies the segment as free and independent, no longer subject to privilege regulations. Pretty much any code can jump to code in such a segment. But that's cheating. Real programmers do it through call gates.
Call me sometime
A call gate is an automated border check; a simple filter to strain out undesirable bits of code arriving from unfavorable locations. Call gates protect and preserve the processor's four-level privilege hierarchy and enable you to keep your code (and other people's code) from overstepping its limits.
Call gates aren't code, but they act almost as if they were. Instead, they're special descriptors, those eight-byte structures that define all memory segments. Unlike all the other descriptors we've seen, a call gate doesn't define a new segment of memory at all; it's just a convenient way to define the entry point into privileged code.
Figure 1: Call gates
Figure 1 shows what a call gate looks like. At first glance, it appears to be structured a lot like any other descriptor, but instead of base-address and length fields, the call gate has segment and offset fields. Together these define the exact address of the target instruction you can call. In other words, you can't jump to just any old address unless it's been identified by a call gate. Each call gate defines exactly one entry point in your code, and every entry point needs its own call gate. No call gate, no far (intersegment) calls.
Odd as it seems, you can't have code from one segment jump to code in another segment, ever. You can only call code in another segment and even then only through a call gate. On early x86 chips you pretty much had to do far (intersegment) jumps and calls to get anything done beyond the 64KB boundary. Newer x86 processors don't allow this; you must define a call gate for each possible target instruction you might ever want to call.
This awkwardness pays off in both security and reliability. For example, call gates prevent you from accidentally (or maliciously) jumping to the wrong address in the target code segment. In pre-'386 days, it was easy to jump to any arbitrary address, which might land you in the middle of a subroutine or, worse yet, the middle of an instruction. Having the full segment and offset specified inside the call gate leaves no chance of transferring control to anywhere but the predefined location.
Another advantage is privilege protection. Three bits within the call gate (bits 44 through 46) define the minimum privilege level needed to use the gate. If your current privilege level (that is, the privilege level of the code segment from which you're currently executing) isn't high enough, the processor will refuse to let you through the gate.
Finally, call gates encapsulate trusted functions. Because the caller never sees the called function, you can change the function it refers to without changing the gate interface. The gated function can be relocated in different versions of your code, for example, and the caller's code wouldn't have to be modified. All the caller sees is the call gate itself, not the code behind it.
Now, because the call gate looks like a segment descriptor and lives in the global descriptor table (GDT) with all the "real" segment descriptors, you treat it like just another code segment. To actually make your far call, you pretend to call the code segment of the call gate, not the code segment you really want. The processor checks your credentials and if you pass muster, you'll be transferred to the segment and offset addresses contained in the call gate. Voila! A perfectly secure and controlled transfer. What a load of bother.
By the way, it doesn't matter what address you ask for when you make the call. You'll be transferred to exactly the address the call gate dictates, so the value you load into your instruction pointer is totally irrelevant. It's also worth noting that you don't know what your destination is. Call gates obscure the address they're protecting. Unless you can locate, examine, and dissect all the segment descriptors in the GDT, you'll have no idea where the call gate is taking you.
Even if you did know the destination of your far call, you couldn't read the object code because it would be in a more privileged segment. Besides, the code segment probably wouldn't be defined with read permission. All this helps protect privileged code from hacking or idle curiosity.
Figure 2: How '386 and later processors handle far calls
Setting the machine in motion
Figure 2 illustrates the convoluted and baroque process by which the '386 and later processors handle far calls. A FAR CALL instruction loads a new value into CS and EIP; the latter will be ignored. The value of CS is really the index into the GDT for the call gate, not for a proper code segment. The call gate, in turn, holds code segment index (16 bits) and offset address (32 bits) of the actual target of the subroutine call, as well as the privilege level required to use the gate. The segment in the call gate indexes into the GDT to locate the target's code segment descriptor, which in turn defines the base address of the target code segment. The gate's offset is then added to that base address and checked against the limit (highest legal address) for that segment. If it doesn't exceed the limit then the offset address gets loaded into EIP and the segment gets loaded into CSand away you go.
It's amazing to me that all of this happens in hardware. There's not a lick of code involved in this process, just a lot of data tables. Granted, it takes the processor about 100 clock cycles (less than a microsecond at 500MHz) to do all this, but it's entirely automatic and mechanical. For some small real-time kernels, privilege protection and task switching is all they do; now that work can be done entirely in hardware.
Where code goes, so goes the stack
When you change privilege levels, you change the addressable domain of your program. For example, when your code is running at privilege level 2 (PL2), you can access PL2 and PL3 data segments and a PL2 stack segment. If you make a successful call through a call gate to a PL1 code segment, your privilege level increases to PL1 and you can access PL1, PL2, and PL3 data segments. But what about your stack?
When you change privilege levels, your stack changes automatically. Your old SS segment and stack pointer are abandoned and replaced with new ones that correspond to the new, higher privilege level. Where does this new stack come from? Hmmm, I feel a new data structure coming on.
Believe it or not, there's still one more magical data structure you need to create if you're going to use privilege protection on x86 processors. This new one is called the task state segment (TSS). We'll save the gory details for another day, but in brief, the TSS includes four different sets of stack pointers, one for each privilege level. And, of course, you get to (read: have to) define where each of these stacks will go. Obviously, the PL1 stack should be in a PL1 stack segment, and so forth. You might never need all four of these stacks; your code might never call a subroutine through a PL1 call gate. But leaving these stacks undefined is a really bad idea. You'll have an awfully tough time figuring out why your code suffered a sudden stack failure after a routine function call.
Sharp-eyed readers will notice that we glossed over one of the bit fields in the call gate. Bits 32 through 36 define the number of 32-bit parameters that will be passed from the calling routine into the called routine. The chip will automatically copy this many bytes (times four) from your stack to the called routine's stack.
After every FAR CALL there should be a matching FAR RET (return) instruction that pops the correct number of parameters off the stack. Just to make things tricky, call gates define the number of parameters as 32-bit words, while the FAR RET instruction counts bytes. Be sure to multiply by four before punching your return ticket.
This automatic parameter passing makes it awkward to write a routine that accepts a variable number of arguments. The call gate will copy a fixed number of bytes onto the called routine's stack, and the FAR RET must remove exactly that many at the end. A single function will either need several call gates, each with a different parameter count (and with a matching exit point), or it will have to be coded for a worst-case payload.
If 31 words (124 bytes) isn't enough space, you might want to pass parameters by reference, rather than by value. In other words, push a pointer to a data structure rather than the data itself. After all, the called routine can access any data memory that the calling routine could possibly have used, by virtue of its higher privilege level.
Each FAR RET executes one final step to aid security. Just before control returns to the old, less-privileged code, the data segment registers DS, ES, FS, and GS are all checked to see if the called procedure might have left indexes to more privileged segments in them. If so, the offending segment registers are zeroed. This keeps high-level procedures that are sloppy with their segment registers from unwittingly giving less-privileged procedures access to memory that would otherwise be off-limits.
Oh, and you can't return values on the stack. Both the caller's stack and the called routine's stack will shrink by the number of bytes specified in the call gate and the FAR RET instruction, respectively. When the caller regains control, it looks as though no parameters were ever on the stack. (Besides, the two routines use physically separate stacks.) You've got to return values in registers.
Finally, you should save and restore all segment registers in your called functions. This is more to protect the caller than the callee. If the called function changes any of the data segment registers to point to segments the caller doesn't have permission to access, the processor will zero those segment registers to prevent you from passing on ownership of a privileged segment. The caller might regain control with one or more of its segment registers cleared, which can be a miserable bug to track. If the caller doesn't use, say, the GS register frequently it can be hard to track down why it generates General Protection Faults later on.
So there you have it. Between memory segmentation and privilege protection, it's starting to become clear what those millions of transistors are doing. And we haven't even covered automatic task managementyet another feature of these processors that bears scrutiny. We'll save that for another day.
Jim Turley is the editor in chief of Embedded Systems Programming, a semiconductor industry veteran, and the author of seven books, including the Essential Guide to Semiconductors. You can reach him at email@example.com.