I recently read a fascinating news article regarding recentlypublishedYale bioinformatics research, comparing Linux to genomes. The studyelucidates the stark difference between the E. coli cell regulatorynetwork of genes and the Linux kernel regulatory network: its functioncall graph. The E. coli network is pyramidal, with a few key mastergenes at the top influencing a larger number of “workhorses” at thebottom. In contrast, the Linux call graph resembles a reverse pyramid,with a large number of top-level entry points calling down to a smallnumber of bottom level routines.
The reverse pyramid is failure pronebecause changes to the workhorses – the ones most likely to requireadaptation over time – force corresponding changes up through thecaller hierarchy which is comparatively large. The researchers notedthe rapid churn of code changes originating in the low level kernelroutines. As one of the researchers, Marc Gerstein, commented: “You caneasily see why software systems might be fragile, and biologicalsystems robust. Biological networks are built to adapt to randomchanges. They’re lessons on how to construct something that can changeand evolve.”
Often I have thought of systems software architecture asan analogy to many things, living and not, that share the challenge ofbalancing the naturally opposed goals of providing an extremelysophisticated service while maintaining extremely high levels ofrobustness.
Linux’s lack of robustness is caused primarily by itsstructure (as discussed in the research) but exacerbated by its highrate of modification by a large and diverse group of contributors.Thus, despite following what are considered good commercial softwaredevelopment practices, these developers are statistically guaranteed tointroduce critical flaws at a regular if not increasing rate. A greatoverview article that discusses this is here.
There are two majorclasses of operating systems, monolithic (like Linux, Windows, andSolaris) and microkernel (like INTEGRITY, L4, and Minix). Themonolithic approach places a large number of services in a singlememory space (the OS kernel), where there are many intricate, directand indirect call pathways between software modules, shared memory, andjust a large amount of code. As the researchers point out, a singleflaw can take down the entire system, making it crash-prone.
Mostleading OS experts agree the logic that a microkernel approach is muchbetter for robustness. With a microkernel, the supervisor mode coreprovides only a very small set of critical services: memory protectionfor processes and itself, time scheduling for processes, and eventhandling (such as reacting to crashes in processes). Other servicesthat are typically thought of as part of the operating system – such asnetworking stacks and file systems – are executed in processes insteadof in the kernel. As systems grow in complexity – multimedia, newcommunications mechanisms, web browsers, and so on – these are allbuilt into separate components which use a well-defined, auditableinterface between other components and the kernel. Each component isprovided a private memory space and quota of execution resources(memory, CPU time) that cannot be stolen or corrupted by otherapplications. Systems are composed of only the minimal componentsrequired. This approach promotes a more maintainable, debuggable,testable, and robust system.
The monolithic approach was adopted inolder operating systems due to performance. But Intel and othermicroprocessor designers have thankfully taken this objection off thetable (OS designers have also gotten much better at making themessaging and process switching very fast). The commercial success ofmicrokernels only over the past decade is testament to this.
Of course,the microkernel remains a single point of failure, but this piece issmall and simple enough that it requires little or no changes overtime, can be exhaustively tested, and is amenable to formalmathematical proof of its safety and security functions. One of the keyfeatures of microkernels in this domain is the ability to host avirtualized general purpose OS, like Linux, without impacting therobustness of critical services running directly on the microkernel.Thus, computing can realize the strengths of both worlds – microkerneland monolithic. As an example, Dell sells a specialized desktop PC thatuses a microkernel to host multiple virtual PCs which are able tosecurely and simultaneously connect to distinct classified andunclassified government networks.
Furthermore, we can improve the overall robustness of a Linux systembymoving critical functions out of the bad cells and into the good cells,if you will. For example, if the network security and crypto componentsare moved from Linux to its supervisory microkernel, then malware whichfinds its way into Linux via the Internet cannot masquerade trustednetwork connections which can only be made with the isolated cryptosoftware and keying material. Component isolation is the samecharacteristic that makes some viruses difficult to thwart: you cankill one virus cell, but the other cells continue to wreak havoc.Modern ships use multiple container holds to prevent sinking if one ofthe cells is pierced.
Another topic which has drawn comparisons betweenthe electronic and the organic is the emergence of cloud computing. Thepower of the cloud lies in the on-demand, remote access to services,the combinatorial potential of services across the cloud, and theability to rapidly evolve those services, for example via socialnetworking. In other words, the cloud is the antithesis of the age oldcomputing model in which a user’s digital universe consists of locallystored data and applications. However, if you host the cloud on a smallnumber of all-powerful centralized data centers (e.g. Amazon, Google),than this is going against the natural robustness grain. The cloudshouldn’t be a great cumulonimbus; micro-cloudlets of cirrostratusperhaps?
Dave Kleidermacher has been developing systems software for high criticality embeddedsystems for more than 20 years and is one of the original developers ofthe INTEGRITY operating system, the first software technology certifiedto EAL 6+ High Robustness , the highest Common Criteria security levelever achieved for software. He managed INTEGRITYʼs development for adecade and now serves as the chief technology officer at Green HillsSoftware This is his personal blog; opinions expressed are notnecessarily those of GHS.
Copyright (c) David Kleidermacher