Microkernels rule! - Embedded.com

Microkernels rule!

Microkernels tend to stir up emotions. Linus Torvalds likes to bag them and a paper at last year's Ottawa Linux Symposium claimed they suckthey suck (based on arguments that I debunked at this year's LCA.

Clearly, there's a lot of skepticism.

Do microkernels suck? In a nutshell: no, they rock–if they're well designed and implemented. I will elaborate.

Microkernels, first invented around 1970, are based on the idea of moving the operating system's services out of the kernel (the code that runs in the processor's privileged mode) into user-mode servers. Microkernels were en vogue in the 1980s; it seemed as if everyone was building one. And, as with many fashionable technologies, their proponents promised much and delivered little.

Mach, an OS that was widely used as the basis of systems, ran into serious performance problems, and its contemporaries (like Chorus and QNX) weren't much better. There were spectacular failures, none more so than IBM's Workplace OS, which cost the company a cool two gigabucks. Others, like the Mach-based OSF/1 and Next operating systems, ended up moving most OS functionality back into the kernel. This is how Mac OS X functions today; it has given up any pretense of being a microkernel OS.

Needless to say, the experience with Mach and others created a bit of an image problem for microkernels (which didn't stop the GNU Hurd from repeating the mistakes of the past). However, back in 1993, Jochen Liedtke demonstrated that these performance problems weren't inherent in the microkernel concept. His L4 microkernel ran rings around the competition, outperforming Mach and QNX by factors of 5 to 20 in the cost of the critical IPC primitive. In an analysis of Mach, he showed that the poor performance was a result of excessive size (about 300 kernel APIs and 100s kLOC is certainly not “micro”).

Liedtke showed that minimality was critical and formulated what is now the accepted definition: a microkernel only contains code that must execute in kernel mode; everything else should run as user-mode programs. He also outlined design principles for flexible, high-performance microkernels.This should have settled the issue. But it didn't, as the recent debates demonstrate. Much of this discussion is ill-informed, usually based on folklore from the Mach days. Typical arguments against microkernels are along these lines: because microkernels implement system services as user-level programs, components that are normally inside the kernel are invoked via IPC, which has extra overhead.

While this is essentially true, the implications are much less severe than many think. On modern processors and a high-performance kernel like L4, that overhead is less than a microsecond per service invocation. For most services, this is negligible.

Another argument is that because a microkernel OS consists of many servers communicating by messages, this can lead to deadlocks. We know that ill-designed synchronization can lead to deadlocks. And it doesn't matter whether that synchronization is via IPC between servers or via locks inside a monolithic kernel. Concurrency control is hard to understand and easy to get wrong–in any system.

Then there are claims that reliability is not increased by microkernels, as failure of one server will force failure of the whole system. This statement, made repeatedly by Linus and others, is a curious inversion of the facts. Of course, there are critical user-mode components in a microkernel system whose failures are fatal. For example, if the file system that contains the code for system services fails, it can't be restarted. However, this is a small portion of all services. Others can be restarted quite easily without any effect on the remainder of the system (other than maybe a short delay), as has been demonstrated for years by QNX, L4/Mungi, and more recently by Minix 3. Compare this to a Linux system, where even a bug in a USB or audio driver will crash the system, and where user-level root daemons can break everything.

Reliability is one of the microkernels' greatest assets. The amount of code that must be fully trusted, the so-called trusted computing base (TCB), is dramatically reduced. In the case of the Open Kernel Labs' OKL4 microkernel system, this TCB is less than 20 kLOC. This is at least an order of magnitude less than Linux, where the kernel alone has 100 s of kLOC, even in a minimal embedded configuration.

Today, the reduction in TCB size is the real killer advantage of microkernels: small size implies fewer bugs and hence increased security and safety.

Yet this small platform is generic and flexible enough to support the construction of complete OSes on top. It's also an excellent virtualization platform, as demonstrated by OK Linux, which on ARMv5 processors virtualizes Linux with an overhead of as little as 3%.

Microkernels are no longer an academic toy. They've proven themselves to be ready for prime time, that performance isn't an issue if the microkernel is well-designed and implemented, and that they can improve system robustness. Formal verification gives them an advantage that can't be matched by other approaches. With the increasing deployment of sophisticated (and hence complex) embedded systems in mission- and life-critical scenarios, we need a rock-solid base on which to build systems that are truly safe and secure. There is simply no alternative to microkernels for such systems.

Gernot Heiser , co-founder of Open Kernel Labs, is the company's Chief Technology Officer. Prior to co-founding OK, Dr. Heiser created and led the Embedded, Real-Time and Operating Systems (ERTOS) research program at NICTA. He holds a PhD from ETH Zurich. Heiser can be reached at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.