Getting down to basics: Running Linux on a 32-/64-bit RISC architecture - Part 1 -

Getting down to basics: Running Linux on a 32-/64-bit RISC architecture – Part 1

This is the first part in a series all about how to run the Linux operating system on the MIPS 32k/64k architecture. Why? Well, this is what a CPU is for. A CPU “architecture” is the description of what a useful CPU does, and a useful CPU runs programs under the control of an operating system.

Although many operating systems run on the MIPS architecture, the greatthing about Linux is that it's public. Anyone can download the sourcecode, so anyone can see how it works.

Any operating system is just a bunch of programs. Ingeniousprograms, and – perhaps more than most software – built on a set ofideas that have been refined and figured out over the years. Anoperating system is supposed to be particularly reliable (doesn'tcrash) and secure (doesn't let some program do things the OS hasn'tbeen told to let it do).

Correct usage sees “Linux” as the name of theoperating system kernel originally written by Linus Torvalds, a kernel whosesubsequent history is, well, history. Most of the (much larger) rest ofthe system came from projects organized under the “GNU” banner of the FreeSoftware Foundation . Everybody sometimes forgets and callsthe whole thing “Linux.”

Both sides of this process emerged as a reaction to the seminal workon the UNIX operating system developedby Bell Laboratories in the 1970s. Probably because Bell saw it as ofno commercial value, it distributed the software widely to academicinstitutions under terms that were then unprecedently “open.”

But it wasn't “open source” – many programmers worked on UNIX atuniversity, only to find that their contributions were either lost orwere now owned by Bell Labs (and their many successors). Frustrationwith this process eventually drove people to write “really free”replacements.

The last key part was the kernel. Kernels are quite difficultprograms, but the delay was cultural: OS kernels were seen as somethingfor academic groups, and those groups wanted to go beyond UNIX, not torecreate it.

The post-UNIX fashion was for a small, modular operating systembuilt of clearly separated components, but no OS built on that basisever found a significant user base. Linux won out because it was a muchmore pragmatic project.

(Some claim that Windows/NT – andtherefore most modern versions of Microsoft Windows – has amicrokernel. That may be true, but it certainly lost any claims to besmall or modular on its way to world domination. )

Linus and his fellow developers wanted something that worked (on x86desktops, in the first instance). When the Linux kernel was incompetition with offshoots of the finally free BSD4.4 system, BSDprotagonists insisted with some justification on their superiorengineering. But the Linux community had arrived at an understanding ofa far more “open” development style.

Linux evolved quickly. Sometimes, it evolved quickly because Linuxpeople were perfectly happy to adapt BSD code. It wasn't long beforeLinux triumphed, and the engineering got better, too.

Basic Linux Building Blocks
To get to grips with any artifact you need to attach some good workingmeaning to the terms used by its experts, and you are particularlylikely to be confused by terms you already know, but with not quite thesame meaning. The UNIX/Linux heritage is long enough that there arelots of magic words: thread, file, user mode and system calls: interrupt context, Interrupt serviceroutine (ISR), scheduler, memory map/address space, thread group, highmemory, libraries and applications.

Thread: The best general definition of “thread” I know is “a set of computerinstructions being run in the order specified by the programmer.” TheLinux kernel has an explicit notion of a thread (for each threadthere's a struct thread struct).

It's almost the same thing, but by the terms of my definition alow-level interrupt handler (for example) is a distinct thread thathappens to have borrowed the environment of the interrupted thread torun with. Both definitions are valuable, and we'll say “Linux thread”when necessary.

Linux loves threads (there are currently 134 on the desktop machineI'm typing this on). Most of those threads correspond to an activeapplication program – but there are quite a few special-purpose threadsthat run only in the kernel, and some applications have multiplethreads. One of the kernel's basic jobs is scheduling – picking whichLinux thread to run next, which will be discussed later.

File: Anamed chunk of data. In GNU/Linux, most of the interactions a programmakes with the world beyond its process are done by reading and writingfiles. Files can just be things you write data to and get it backlater.

But there are also special files that lead to device drivers: Readone of those and the data comes from a keyboard, write another and yourdata is interpreted as digital audio and sent out to a loudspeaker. TheLinux kernel likes to avoid too many new system calls, so special /procfiles are also used to allow applications to get information about thekernel.

User mode andsystem calls: Linux applications run in user mode, thelower-privilege state of MIPS CPUs. In user mode, the software can'tdirectly access the parts of the address space where the kernel lives,and all the locations it can address are mapped to pages the kernel hasagreed to let the application playwith. In usermode, you can't run thecoprocessor zero CPU control instructions.

(GNU/Linux application code thatruns in user mode is frequently referred to as userland. )

To obtain any service from the kernel (most often, to read or writea file) the application makes a systemcall. A systemcall is adeliberately planted exception, interpreted by the kernel's exceptionhandler. The exception switches to high-privilege mode.

Through the system call, Linux application threads run quite happilyin the kernel in high-privilege mode (but of course they're runningtrusted code there).

When it's done, the return from exception code involves an eret,which makes sure that the change back to user mode and the return touser mode code are done simultaneously.

Interruptcontext: Linux tries not to disable interrupts too much. WhenLinux is running, at any moment there's an active thread on a CPU: Soan interrupt borrows what appears to be the context of that threaduntil it finishes its business and returns.

(Even if the kernel is waiting ina power-down mode, there's a thread that is executing the waitinstruction. )

Code called from an interrupt handler is in interrupt context, andthere are many things such code should not do. It can't do anythingthat might have to wait for some other software activity, for example.

If your keyboard input routine is going to log all keystrokes to afile, then you can't do that by calling the file output routine fromthe interrupt handler. (Perhaps not agood idea from a security point of view, but still . . . )

There are decent ways to do that: You can get the keyboard interruptto arrange to wake some Linux thread that obtains and logs the input,for example.

Interruptservice routine (ISR): The lowest-level interrupt code in thedevice driver is generally called an ISR. In Linux you're encouraged tokeep this code short: If there's lots of work to do, you can considerusing some kind of “bottom half,” as described in later in this series.

Scheduler: Akernel subroutine. The OS maintains a list of threads that are ready torun (they're not blocked on anincomplete I/O transfer, for example ), and that list is inpriority order.

The priority is dynamic, and is recalculated periodically – mostlyto ensure that long-running computations don't hog the CPU and preventit from responding to events. Applications can lower their own priorityto volunteer for a life in the background but can't usually raise it.

After any interrupt is handled, the scheduler will be called. If thescheduler finds another thread is more worthy of running, it parks thecurrent thread and runs the winner.

Older Linux kernels were not preemptive: once a thread was runningin the kernel it was allowed to rununtil it either volunteered forrescheduling (by waiting on something) or until control was just aboutto pass back into userland – only then would the kernel contemplate athread switch.

A nonpreemtive kernel is easier to program. Your kernel codesequence might have to worry about interrupt handlers runningunexpectedly while it was in flight, but you knew it could never beunexpectedly caught halfway through something by some other mainstreamkernel code. But it led to excessive delays and inadequateresponsiveness.

The luxurious freedom from interference from parallel threads islost when you have an SMP kernel (wheretwo CPUs are simultaneously threading the same kernel ).

To make the SMP kernel work properly, hundreds of possibleinteractions need to be tracked down and protected with appropriatelocks. The SMP locks are (in almost all cases) exactly where you needthem to be to permit the scheduler to stop a running kernel thread andrun another: That's called kernel preemption.

It's now an important kernel programming discipline to recognizecode sequences where preemption must be temporarily inhibited. Themacros used to mark the start and end of that code have definitionsthat change according to kernel configuration to work correctly onuniprocessor or SMP systems.

Memorymap/address space: The map of memory locations available to aparticular Linux thread. The address space of a thread is definedthrough a mm struct, pointed to by the thread.

For Linux OS ported to the MIPS architecture (hereinafter, “Linux/ MIPS”) on a32-bit processor, the high half of the address space (addresses with bit 31 set) can beread and written only in kernel-privilege mode.

The kernel code/data is normally in the corner of this, known askseg0, which means the kernel itself does not depend on addressestranslated through the TLB.

The user part of the address space is mapped differently for eachApplication – only threads that collaborate in an explicitlymultithreaded application share the user address space (i.e., theypoint to the same mm struct). But all Linux threads share the samekernel map.

A thread running a conventional single-threaded application runs inan address space that is distinct from all other threads and is exactlywhat older UNIX-like systems called a “process.”

At any given time, much of an application's address space may not infact be mapped, or even not represented by any data present in physicalmemory at all.

An attempt to access that will cause a TLB exception, which will behandled by the OS, which will load any missing data and set up anappropriate mapping before it returns to the application. That is, ofcourse, virtual memory.

Thread group: Thecollection of threads within the same memory map is called a threadgroup. Where a group has two or more members, those threads arecooperating to run the same program. The thread group is another goodapproximation in Linux to what is called a “process” in old UNIXsystems.

High memory: Physical memory above 512 MB (whether real read/write memory ormemory-mapped I/O locations) is not directly accessible through thekseg0 (cached) or kseg1 (uncached) windows.

On a 32-bit CPU physical addresses above the low 512 MB are “highmemory” in the Linux sense and can only be accessed through TLBmappings. With a MIPS CPU, you can create a few permanent mappings bydefining “wired” TLB entries, protected from replacement.

But Linux tries to avoid using resources that will quickly run out,so mainstream kernel code avoids wired entries completely. ForLinux/MIPS, high-memory mappings are maintained dynamically by TLBentries created on demand.

Libraries andapplications: Long ago, applications running on UNIX-likesystems were monolithic pieces of code, which were loaded as required.You built them by compiling some source code and gluing in some libraryfunctions – prebuilt binaries provided with your tool chain.

But there are two things wrong with that. One is that the librarycode is often bigger than the application that attaches to it, bloatingall the programs. The other is that if a supplier fixes a bug in alibrary function, you don't get full benefit from the fix until everysoftware maintainer rebuilds his or her application.

Instead, the application is built without the library functions. Thenames of the missing libraries are built into the application, so theloader can find the required libraries and stitch them in when theapplication is loaded.

So long as the library continues to provide identical functions,everything should be fine (there's a library version”tracking system toallow libraries to evolve functionally, too, but that's beyond ourscope).

That carries a penalty. When you link a program at load time out ofpieces (each of which may get separately updated), the exact address ofthe components is unpredictable at build time. You can't predict inadvance which locations will be available for loading a particularlibrary.

The runtime loader can do no better than to load each library in thenext space available, so even the starting address for a library isunpredictable. A library binary has to be position-independent code orPIC – it must run correctly wherever its code and data are positionedin virtual address space.

Layering the kernel
From one point of view, the kernel is a set of subroutines called fromexception handlers. The raw post-exception “exception mode” environmenton a MIPS CPU is all-powerful and very low-overhead but tricky toprogram.

So with each entry to the kernel you get something like aforeshortened bootstrap process, as each “layer” constructs theenvironment necessary for the next one. Moreover, as you exit from thekernel you pass through the same layers again, in reverse order,passing briefly through exception mode again before the final eretwhich returns you to userland.

Different environments in the kernel are built by more or lesselaborate software which makes up for the limitations of the exceptionhandler environment. Let's list a few starting at the bottom, as thekernel is entered:

MIPS CPU in Exception Mode
Immediately after taking an exception, the CPU has SR(EXL) set -it's inexception mode. Exception mode forces the CPU into kernel-privilegemode and disables interrupts, regardless of the setting of other SRbits. Moreover, the CPU cannot take a nested exception in exceptionmode except in a very peculiar way.

(There are some cunning tricks inMIPS history that exploit the peculiar behavior of an exception fromexception mode – but Linux doesn't use any of them .)

The first few instructions of an exception handler usually save thevalues of the CPU's general-purpose registers, whose values are likelyto be important to the software that was running before the exception.They're saved on the kernel stack of the process that was running whenthe interrupt hit.

It's in the nature of MIPS that the store operations that save theregister require you to use at least one general-purpose registerfirst, which is why the registers called k0 and k1 are reserved for theuse of exception handlers.

The handler also saves the values of some key CP0 registers: SR willbe changed in the next section of the exception handler, but the wholeat-exception value should be kept intact for when we return. Oncethat's done, we're ready to leave exception mode by changing SR, thoughwe are going to leave interrupts disabled.

A CISC CPU like an x86 has no equivalent of exception mode; the workdone in MIPS exception mode is done by hardware (really by invisible microcode). Anx86 arrives at an interrupt or trap handler with registers alreadysaved.

The software run in MIPS exception mode can be seen as producing avirtual machine that looks after saving the interrupted user program'sstate immediately after an exception and then restores it whilepreparing for the eret, which will take us back again.

Programmers need to be very careful what they do in exception mode.Exceptions are largely beyond the control of the software locks thatmake the kernel thread-safe, so exception code may only interact verycarefully with the rest of the kernel.

In the particular case of the exception used to implement a systemcall, it's not really necessary to save GP registers at all (so long as the exception handler doesn'toverwrite the s0″s8 “saved” registers, that is). In a systemcall or any noninterrupt exception, you can call straight out to coderunning in thread context.

Some particularly simple exception handlers never leave exceptionmode. Such code doesn't even have to save the registers (it just avoidsusing most of them). An example is the “TLB refill” exception handlerdescribed later in this series.

It's also possible – though currently unusual – to have an interrupthandler that runs briefly at exception level, does its minimalbusiness, and returns. But such an interrupt handler has no realvisibility at the OS level, and at some point will have to cause aLinux-recognized interrupt to get higher-level software working on itsdata.

MIPS CPU with Some or AllInterrupts Off
As we'll see later in this series, an interrupt routine exits exceptionmode but continues to run with at least some interrupts disabled.

Running with all interrupts disabled is a costly but effective wayof getting a single CPU to be nonpreemptive (the longest time softwarespends with interrupts disabled determines your worst-case interruptlatency, and every device driver with a real-time constraint mustbudget for it). And of course it doesn't prevent re-entrance wherethere's a second CPU at work.

The simplest, shortest kind of ISR may opt to run to completionwithout ever re-enabling interrupts – Linux can support this and callsit a fast interrupt handler. You get that behavior by setting the flagSA INTERRUPT when registering the ISR. But most run for a while withhigher-priority interrupts enabled.

Potentially, you can get a stack of interrupts interruptinginterrupts. Infinite recursion (and stack overflow and an inevitablecrash) can't happen because Linux makes sure you can stack up at mostone entry at each distinct interrupt level. The amount of data saved ateach level must be small enough that the maximum stack of interruptsave information will not overfill a thread's kernel stack.

InterruptContext. After an interrupt, even after the interrupt handlerhas re-enabled most interrupts and built a full C environment,interrupt code is still limited because it's borrowing the state (andkernel stack) of whichever thread happened to be interrupted.

Servicing an interrupt is someone's business, certainly, but it hasno systematic relationship with the thread that is executing when theinterrupt happens. An interrupt borrows the kernel stack of its victimthread and runs parasitically on that thread's environment. Thesoftware is in interrupt context, and to prevent unreasonabledisruption, interrupt-context code is restricted in what it can do.

One vital job done by the kernel is the scheduler, which determineswhich thread the OS should run next. The scheduler is a subroutine,called by a thread; in some cases it's called by a thread in interruptcontext. Once the interrupt context part of an interrupt handler canget to the point where the hardware's immediate needs are met, it can(and often does) schedule a thread that will complete theinterrupt-handling job, this time in thread context.

Executing the Kernel in ThreadContext
You can arrive in the kernel in thread context either when anapplication has made a voluntary system call or a forced call forresources on a virtual memory exception (and the system call or VMexception has emerged from its lower layers), or as a result of areschedule – which is, in turn, always either caused by an interrupt orby another thread voluntarily rescheduling itself because it's waitingfor some event.

System calls are a sort of “subroutine call with security checks.”But a range of other exceptions – notably virtual memory maintenanceexceptions – are very much the same, even though the application didn'tknow this particular system call was necessary until it got theexception.

Not every thread is an application thread. Special threads with noattached application can be used to schedule work in the kernel inprocess context for device management and other kernel functions.

Thread context is the “normal” state of the kernel, and much effortis spent making sure that most kernel execution time is spent in thismode. An interrupt handler's “bottom half ” code, which is scheduledinto a work queue (as noted earlier), is in thread context, forexample.

To read Part 2 go to: Howhardwareand software work together
To read Part 3 go to: Whathappens on a system call
To read Part 4, go to: What wereally want
To read Part 5, go to: MIPSspecific issues in the Linux kernel
To read Part 6, go to: CP0pipeline hazards, multiprocessors & coherhent caches

Thisseries of six articles is based on material from “SeeMIPS Run Linux,” by Dominic Sweetman and is used with thepermission of thepublisher, Morgan Kaufmann/Elsevier, which retains full copyrights. Itcan be purchased on line.

Dominic Sweetman is asoftware/hardware boundary expert based in London, England, whopreviously served as managing director at Algorithmics Ltd.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.