No operating system is an island

At F5 Networks, our BIG-IP device for application delivery networking combines the functions of a dozen or more single-purpose network appliances. Web application acceleration, load balancing, low-level packet routing, content compression, client authentication, rate shaping, and IPv6 gateway services are just a few of its services, and BIG-IP's capabilities are getting bigger as new ones are added.

The BIG-IP device by F5 Networks uses an operating system and embedded database management system to stay afloat in a sea of real-time data.

These applications' need for high performance, availability, and reliability–and F5 Networks' desire for scalability and extensibility, so the device's features can keep growing–led the company to design and build a specialized embedded operating system for the BIG-IP product family.

In this article, we describe why we enhanced BIG-IP with an embedded operating system and an embedded data management system, specifically what technical issues we were trying to avoid. This approach could be of use to embedded systems engineers developing networking, communications, or any other type of complex real-time application in which multiple processes read and update shared data in order to perform their work.

An operating system for BIG-IP
The Traffic Management Operating System (TMOS) is the operating system we built for BIG-IP. It's an embedded device platform that is highly optimized for delivering applications over the Internet and other IP networks, and gives the device a performance edge. BIG-IP's popularity has drawn attention to TMOS, and while nearly all the notice has been positive, our operating system has sometimes been mischaracterized as a Linux variant. In fact, BIG-IP does include a copy of Linux, which runs alongside TMOS and provides certain management tasks, such as the command line and Web graphical user interface.

However, the packets flowing through BIG-IP are not “touched” by Linux in any way. Every important system aspect is contained within TMOS and optimized for high-speed, high-volume traffic-management applications. TMOS has its own microkernel, the Traffic Management Microkernel (TMM).

BIG-IP's platform reimplements standard operating system features such as the scheduler and Secure Sockets Layer stack to achieve lower overhead and higher flexibility. It also adds new capabilities, such as iRules, a scripting language (based on Transaction Control Language) for real-time manipulation and modification of traffic, to enable users to achieve complex networking goals.

Rethinking the wheel
In rethinking operating system features in terms of their potential for improving our networking device, we targeted configuration management for a complete reengineering. In Linux and UNIX platforms, provisioning applications and services with settings is handled via a configuration file, usually in ASCII format, that is accessed at startup and during operation.

The problems of this approach are well known. One is a lack of centralization: multiple programs running on a machine generate a profusion of config files that are stored in different places. In these scattered locations, the files are unprotected and sometimes difficult to find. In addition, configuration files' formatting is nonstandard, so that updating them requires knowledge of each file's layout.

In addition to these general problems, our engineers identified requirements for configuration management specific to a multifunction, real-time networking device. One is performance. Network traffic management inherently occurs in real time. With ASCII configuration files, data is saved to disk, making for slow retrieval; often a process must navigate a lengthy configuration file, with multiple disk I/Os, to retrieve or check a single configuration value. During BIG-IP's development, it became clear that only in-memory hosting of configuration data was likely to provide the real-time storage, sorting, and retrieval that the technology requires.

A second demand that drove our software engineers to build a dramatically different configuration engine within TMOS was for large-scale configuration management. At any given time, hundreds of processes may be running inside the BIG-IP device, resulting in the need to manage an unusually high volume of configuration data. A third requirement was support for data sharing between processes.

To a much greater extent than in typical UNIX and Linux boxes, processes running in BIG-IP must “know” about the same data objects, which range from virtual servers, which provide outward-facing IP addresses that redirect traffic to servers and other resources; to trunks, or logical groupings of interfaces; to static routing information used in the device's layer 3 switching functions; to many other object types (we've counted approximately 500 different configuration object types).

A fourth requirement was complex interrelationships between objects. Multiple objects can reference the same real-world entity, such as an IP address; and less complex objects can be combined into more complex ones. Because these objects are shared and interrelated, in the TMOS design it was essential to centralize the objects' location and provide a mechanism for cross-object validation and synchronization, to protect data integrity.

A database system in TMOS
What we also needed was linear performance in BIG-IP's configuration management. If a customer using the device has 100 virtual IP addresses that have to be managed in a certain way, and the time for doing that is X , then the time for doing the same thing to 200 virtual IP addresses has to be two-times-X . What F5 Networks–and ultimately, its customers–can't tolerate is exponential degradation of performance in organizing, querying, and retrieving such data

In short, to tackle configuration management in BIG-IP, the platform needed to manage a large volume of potentially complex data, as well as data relationships–and so it needed a database management system (DBMS). If you're familiar only with ordinary relational database management systems (RDBMS) used in business or other “non-real-time” applications, a database might seem like the last thing we'd want.

RDBMS technology has a reputation for slowness–at least, compared with the performance demands in embedded, real-time systems like BIG-IP. Traditional database systems are typically hardwired to save records to disk, and this is particularly draining on performance, although such DBMSs do have complex caching logic that keeps some frequently requested data in memory, for faster access.

Our performance requirements, in contrast, demanded a database entirely in memory, without any disk I/O, and preferably without caching (why bother with it when all data is already held in memory?). We found the right fit in an embedded DBMS that was designed from scratch to reside in main memory, as shown in Figure 1 .

View the full-size image

Its design eliminates not only mechanical disk I/O, but also the file I/O, caching and related logic that would burden a traditional disk-based DBMS's performance even if it were deployed in memory (such as on a RAM disk). Because the embedded DBMS is streamlined, its code footprint and processing demands are quite low–and the extra RAM and CPU cycles this frees up can always be used to benefit the applications running on BIG-IP.

Looking at Figure 1 , you can see the key elements of this embedded database system includes a development application programming interface (API) with a static function library for common tasks such as opening and closing the database. Most of the API is generated dynamically as a by-product of compiling the database definition language, and is highly optimized for that definition. Supported indexes include Hash, B-Tree, R-Tree, Patricia trie, and custom. Notably absent from the design are caching and file management layers, which are made redundant in this all-in-memory architecture.

The embedded DBMS was integrated in BIG-IP's control plane, as part of the Master Control Program (MCP). The MCP serves as a collection and distribution point for all configuration data, whether provided by the user or internal to the device itself (Figure 2 ). The MCP is responsible for validation, or enforcing rules when a user or process attempts to change a data object, and for using BIG-IP's internal messaging to push updated data to processes that subscribe to a subset of the configuration database in order to receive data needed for their operation.

View the full-size image

As Figure 2 shows, in order to perform the cross-object validation that BIG-IP's MCP needs to update the pool object so it recognizes three new pool members, the following operations are initiated:

1. A transaction begins and an attempt is made to update the in-memory embedded database so that the pool object recognizes three new pool members.

2. The event notifications feature of the in-memory embedded database (eXtremeDB) notifies MCP's validation logic of the attempted change.

3. The MCP validation logic determines that to allow this change, it must verify the existence of the three new pool members. Three database read operations are performed to accomplish this.

4. If valid, the event notification prompts the in-memory embedded database to commit the transaction. If the request is invalid, the update request is rejected and the transaction is rolled back.

Processes running within BIG-IP rely on the Master Control Program within TMOS to provide up-to-date configuration settings. The MCP, in turn, uses the integrated in-memory embedded database to logically structure that data, and to provide efficient data access and storage methods through its database application programming interface and its implementation of multiple index types. Two features of the embedded database we used, transaction processing and event notifications, described next, have proven essential to the often challenging task of safeguarding the integrity of data that is shared between objects and processes.

Synchronizing and validating data
What's going on “inside the box” of BIG-IP that draws on the master control program resources, and its embedded database? Clients of the MCP, and their demands, are incredibly diverse. A single BIG-IP service may consist of one process, or of many tasks that run concurrently. Just a few of the services and processes configured by the MCP include:

• BIG-IP's Command Line Interface (CLI) is an MCP client. When a user changes the configuration or requests to view configuration, the CLI connects with the MCP, which in turn interacts with the embedded DBMS to fulfill the request.

• BIG-IP uses an ASIC as a layer 4 accelerator on the device's data plane, and this requires information including IP addresses, ports, and instructions for low-level traffic switching.

• The traffic management microkernel itself is a process that accesses the database (via MCP). It must be configured, and it must be told about configuration changes.

• A Java demon runs on BIG-IP to implement parts of the device's graphical use interface. To view, configure, add and delete items, it interacts with the embedded DBMS as a client.

• SNMP is used extensively for device management. SNMP configuration data stored in the embedded DBMS includes management information base (MIB) files, trap settings, and permissions for read-only vs. read/write access.

The average number of configuration objects used in a deployed BIG-IP device varies widely, depending on an organization's networking goals and its use of the product. Inside BIG-IP, while some services and processes use different configuration objects than others, there is significant overlap, with many data objects used by many processes.

And since that “use” often entails the ability to change data objects, as well as read from them, the master control program must be able to serve as the traffic cop, blocking changes that are illogical or could harm system operation, and keeping related data objects synchronized.

MCP does this with the help of two features of the embedded database we used. Transactions enable multiple changes to be grouped together and completed as a single unit, or rolled back to the pretransaction state. The database system's event notifications are based on statements, written into the database design, that cause the embedded DBMS to notify the application when an event such as an object update, deletion, or insertion occurs.

Event notification handlers can be configured to receive events synchronously (shown in Figure 3 ) or asynchronously (shown in Figure 4 ). Synchronous notifications occur within the scope of a transaction and are implemented so that transactions don't finish (commit or roll back) until the notification is “handled.”

View the full-size image

As Figure 3 shows, synchronous event handling in an in-memory database occurs within the context of a single thread and within the scope of a transaction. As each event (Add Record, Update Record Field, and Delete Record) occurs in the multistep transaction, the requirements of that event's handler logic must be satisfied before the thread proceeds and the transaction ultimately commits. Asynchronous event handling (shown in Figure 4 ) occurs after a database transaction commits. Separate threads run in parallel to address each event

View the full-size image

Let's look at how these features work together to ensure data consistency in a simple BIG-IP configuration update operation. As mentioned above, virtual servers are among the fundamental BIG-IP data objects. A virtual server has a virtual IP address that faces outward, toward incoming network traffic.

When a company uses BIG-IP, a virtual IP address can handle all the traffic, such as Web server requests, from the outside world as users try to access the company on the Internet or on a network. The virtual server distributes this traffic to the appropriate resources within the company in keeping with system goals, such as minimal response time, prioritization of specified traffic, load distribution, and so forth.

A virtual server needs to have a pool address associated with it. The pool is a data object that contains pool members. Each pool member also has an IP address–not a virtual one, but a real one, configured by the user.

These pool members can be back-end servers of some sort, usually within a local area network, or other devices that serve as destinations for network traffic. The pool “knows” about its pool members, and it has other attributes, including methods for load balancing, such as round robin algorithms, and information about quality of service, all of which are stored in the database.

In short, the virtual server knows about the pool, and the pool knows about its pool members. These three object types may be configured at the same time. MCP is not told about changes in their configuration in any particular order–the virtual server's configuration can be changed first, or the pool member's configuration can be changed first.

However, it's necessary to make sure any change to each one's configuration data meets rules particular to that data, as well as rules imposed by that object's relationship with the other two objects. All the rules need to be satisfied before the new configuration data is delivered to the data plane, where the BIG-IP device performs real-time switching, load-balancing, and other tasks.

When configuration of the virtual server, pool, or pool members is changed, MCP can perform two layers of validation, both within the scope of an embedded database transaction.

When a new value is added for an attribute of, say, a pool object, the database notifications feature triggers the first level of validation. The embedded DBMS's notifications automatically push the news of this change to MCP's validation logic, where algorithms ensure that the new data–an IP address, for example–is within the allowable range, the right data type, and so forth.

All of this takes place within a transaction, and the change to the data object is not committed yet. If the proposed change is found to be invalid, the process “rolls back” and the database is returned to its pretransaction state. In other words, the event notification handler can determine that the data is invalid and cause the transaction to fail (be aborted). If validation is successful, the transaction closes, the change is committed, and MCP communicates the change(s) to subscriber processes.

The higher level of validation is “cross-object validation” and occurs when the validity of a change to an object or objects may depend on the status of another object or objects. In this scenario, updates to multiple objects are often strung together and validated as a group. This still takes place within a transaction, but the validation logic can be considerably more complex and require cross-checking.

For example, the virtual server might be updated so that it “knows” about pool X . The cross-object validation has to determine that pool X exists. Or the pool might be newly configured to recognize three pool members. Do all three of these members exist? These validation inquiries themselves require interaction with the database and must complete before the transaction is allowed to finish.

Cross-object validation operation can grow quite complex and involve reading from many database objects, and the updates that occur at the end of transactions require database writes. With a database system that goes to disk, the performance cost of our configuration management strategy would be prohibitive. However, ultimately the complexity of the configuration and the complexity of the relationships of elements mean we need a database, with very fast lookup and traversal of objects.

BIG-IP needs linear performance in its configuration management. If a customer using the device has 100 virtual IP addresses that have to be managed in a certain way, and the time for doing that is X , then the time for doing the same thing to 200 virtual IP addresses has to be two-times-X . What F5 Networks–and ultimately, its customers–can't tolerate is exponential degradation of performance in organizing, querying, and retrieving such data.

Ryan Kearny is vice president of product development at F5 and leads the BIG-IP software development teams. He has an electrical engineering degree from the University of Washington.

Steve Graves is cofounder and CEO of McObject, which provides the eXtremeDB embedded database technology used in BIG-IP. Steve is a member of the advisory board for the University of Washington's certificate program in Embedded and Real Time Systems Programming.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.