Low-level service software forms the foundation upon which reliable systems are built. Here's a look at some of the design issues and options.
Software developers who travel on business have more time than ever before to draw parallels between the organization of airlines and those of complex software architectures. While waiting and waiting and waiting at the gate for the aircraft to arrive, developers should ponder the role of “service” in both its business and software significance.
An airline's purpose is to transport people by airplane, quickly and profitably, from one location to another. Yet how many people on the airline's payroll actually pilot the aircraft? Much of the staff exists to provide “services” that are vital to accomplish the airline's mission, but do not involve piloting airplanes. Mechanics maintain and repair the aircraft. Ground crews refuel and provision each flight. Handlers load and unload cargo. Gate agents sell tickets and board passengers. None of these services directly involve piloting the aircraft, but each is vital to the airline's mission.
Low-level software services function in much the same way as the airline's non-flying personnel. They perform operations critical to the system's aims, but do not accomplish the application-specific work that the system was developed to fulfill. Using this definition, a system service is a software component that provides a low-level “service” to other parts of the software, most notably the “application”-level software, which performs the function that the system was designed to accomplish.
Examples of application-level software include communication managers, engine controllers, flight controllers, and telemetry processors. Examples of system services include timer services, inter-process communications, event logging, and storage management. In each case, the service is performing a critical function that is not the application's primary goal.
Role of services in system design
The distinction between system services and application software is important when formulating the system architecture. Low-level service software forms a foundation upon which reliable systems are built. Like airline mechanics, if system services do not operate reliably, the entire system might crash at any time. Yet this crucial software is the most overlooked component of the system.
No matter how well conceived and well executed the higher-level application software, the system will never be reliable without robust system services. Well-designed system services promote modularity, flexibility, reliability, and portability. Poorly implemented services lead to instability, unreliability, insidious bugs, and code bloat. As software systems become increasingly complex, well-designed system services are vital to rapidly develop reliable software systems.
An emerging trend is the growing number of software development efforts that must support multiple hardware platforms. Developers can employ low-level services to abstract the details of the hardware platform into a standard set of capabilities used by the application. This streamlines multi-platform development by leveraging the existing, well-tested application software. The application migrates to different platforms by porting the lower level services on which the application resides. It also allows the application code-base to be easily ported to more capable hardware when it becomes available.
The first step in designing a system service is to identify functions that are good candidates to be implemented as a service. Common functions that are performed by many software components are prime candidates. Shared access to common resources, requiring coordination, should also be considered for implementation as a service. Finally, “low-level” capabilities routinely required by application-level software make excellent candidates for system services.
System services are ideal for providing application-level software access to hardware and system resources. Services offer a means to simplify the application level and abstract the details of the hardware. Such “low level” services can be used to manage non-volatile memory, provide highly accurate timing through hardware support, and manage processor resources. Similar services can also be used to manage relays, switches, or other peripheral hardware manipulated by the application level.
System services can also provide “software services” that do not depend on the existence of specific physical hardware. Services such as inter-process communications, software health verification, event logging, and time-stamping services can be rendered independently of specialized hardware.
Figure 1 describes the location of services within a software architecture. Following is a list of some candidates that can be made into system services:
- Timer service: provides “wake-up” to application software
- Network time service: synchronizes networked embedded systems
- Time-stamp service: accurate time-stamping for troubleshooting and logging
- Inter-process communications: asynchronous communications, queueing of information, control flow
- Software health verification: ensures proper operation of software system
- Non-volatile memory manager: provides non-volatile storage to system software
- Event logger: provides event
- logging and connection to fault-protection
- System state manager: provides
- system state and change notification to application software
- Relay thrower: provides access to hardware relays in system
- Boot/initialization service: provides synchronization at system start
When designing system services, developers face several critical design issues. Figure 2a shows a non-volatile memory manager. In this example, users invoke memory manager “wrapper” functions in the caller's task context. The wrapper functions accept a buffer from the caller in order to read and write the caller's data to EEPROM.
In Figure 2b, the write() function is implemented as a simple loop that writes each word in the buffer to the EEPROM device, waiting for a word-complete interrupt from the hardware before writing the next word. It is easy to visualize how this architecture would introduce mutual exclusion problems when multiple tasks attempt to read and write the hardware simultaneously. Two writers might interfere with each other if they are attempting to write the same addresses, or if they must first set chip-select and memory protection registers for the EEPROM that they are attempting to write. Similarly, if a reader and writer are attempting to access the same addresses simultaneously, the reader may receive a partially written result.
In Figure 2c, a semaphore is added to provide atomic read and write access to the EEPROM. By ensuring that only one task accesses the resource at any time, we preserve the integrity of the EEPROM's contents. But the addition of a semaphore can cause the memory manager to block its callers. This blocking behavior may be inappropriate when callers have hard real-time deadlines. In addition, the service would not be safe to call from interrupt service routines (ISRs). Blocking an ISR usually has disastrous consequences for the system and many RTOSes prohibit ISRs from being blocked.
The solution is to give the memory manager a task-context in which to run, as shown in Figure 3. In this example, the memory manager task blocks on a message queue containing user requests for EEPROM access. These requests do not block the caller, although the semaphore is preserved if the caller wishes to utilize blocking calls. The results for asynchronous calls can be placed into callers' queues by the memory manager upon completion of the read or write operation. The result is a non-blocking, interrupt-safe service for reading and writing non-volatile memory.
An interesting example of how this architecture is used can be found in the flight software of the Mars Pathfinder Spacecraft. The spacecraft entered the Martian atmosphere on a ballistic trajectory at approximately 17,100 mph. It initially employed an ablative heatshield to decelerate the vehicle. As the spacecraft decelerated, the Entry, Descent, and Landing (EDL) task, running under a commercial real-time operating system, had to determine the correct moment to fire a mortar from the rear of the vehicle to deploy a parachute. Using a set of accelerometers, the software was designed to recognize a “peak deceleration” and compute the time-to-fire.
The EDL task faced hard real-time deadlines in the most desperate sense; firing the mortar too early or late would result in a fatal impact with the Martian surface. Yet, it was also vital to store the raw accelerometer measurements, derived engineering values, and atmospheric data for scientific study and for designing future spacecraft. Since a successful landing included a 30G impact with the Martian surface, it was deemed prudent by the spacecraft designers to store the collected data in non-volatile EEPROM memory. An architecture similar to the one described in Figure 3 allowed the measurements to be stored in EEPROM, which had a “slow” write-cycle time, without blocking the critical EDL task.
Of course, implementing common functions as system services has more Earthly design advantages as well. Services promote both flexibility and portability. In the memory manager example, the service could be expanded to read and write other types of storage devices. It could also be re-used on other platforms, or expanded to manage redundant storage. All of which could be accomplished without modifying the application-level code.
Three common software architectures are often used effectively to implement system services. The system designer must judiciously select the architecture that best suits the overall design of the system.
A library implementation of a system service is made up of a collection of functions that run completely in the user's task context. The primary advantage of the library implementation is that it reduces the number of tasks in the software. Decreasing the number of tasks reduces complexity and simplifies the scheduling behavior of the system. Because the service's functions run in many task contexts, the library has to be reentrant and thread-safe. If common resources are accessed from the library, semaphores might have to be used or interrupts locked for the short period of time that the common resources are accessed. This could cause calling tasks to block in the library or delay the operating system from rescheduling. This is the primary disadvantage of the library implementation.
Figure 4 shows an example of a library implementation. A network time service is implemented as a library that encapsulates the local time maintained by a real-time operating system. The “master” obtains the local time from the operating system through the service. It then broadcasts it across the network. Clients on the network receive the broadcast and update their local times using the service. When a subsystem on a client machine requires the current network time, it obtains it locally through the service.
The service can operate in many task contexts. The network message processing software delivers the received network time by calling a function that the service provides to set the local time. The service also runs in the task contexts of “customers” who read the local copy of the current network time. These clients invoke a function provided by the service to obtain the current time.
The service locks interrupts before retrieving other network information associated with the time such as whether the node is logged onto and synchronized with the network. This allows the service to be called from interrupt level. To keep from creating “jitter” in the local time, the broadcasts received from the master only update the local time if they differ by a specified amount.
The non-volatile memory manager discussed earlier was an example of the “asynchronous service” architecture (Figure 3). This architecture requires a unique task context for the service to execute. The service task runs only when it is processing user requests and allows the service to handle the requests in a non-blocking, asynchronous fashion. This is especially appealing when designers desire a data-driven architecture. It is also useful if the service must perform computation-intensive work. The part of the service that runs in the user's context can simply send a message to the service task containing the data to be operated upon. The service then performs the computation-intensive work in its own task context, at its assigned priority.
The callback architecture combines either of the two architectures discussed previously with a callback mechanism. Users submit requests to the service to be processed in an asynchronous manner. When the service completes its processing, it notifies the user through a callback function. Depending on the service, a user can register a callback function initially, or specify a callback when making a request. The callback architecture is especially suited for processing long-term asynchronous requests or handling unsolicited system events.
Note, however, that callbacks should be used sparingly. They add complexity to the system and can cause insidious bugs if used incorrectly. Callback systems are inherently susceptible to users passing bad callback pointers or writing ill-behaved callback functions.
Figure 5 shows a timer service implemented using the library architecture with callbacks. The user creates either a repeating or non-repeating timer that is maintained by the service. The service then manages hardware timers provided by the microprocessor. When the timer expires, the timer expiration ISR, which is part of the service, calls a function that the user registers when the timer is created. Most users will register a function that simply places a message into an appropriate message queue.
The callback architecture is combined with the asynchronous service architecture in Figure 6. A service throws mechanical relays that require a long period of time to actuate. Users queue requests that are processed serially by the relay thrower task. The task programs the hardware to throw the relay and to generate an interrupt when the hardware has completed. An ISR handles the hardware interrupt and queues a completion message to the relay thrower task. The task processes the completion message by invoking the user's callback function to notify the user that the relay has been thrown. The relay thrower task then processes the next request from its queue and programs the hardware to throw another relay.
Service with a smile
System services provide the critical foundation upon which complex systems are built. Well-conceived and executed system services promote modularity, flexibility, reliability, and portability through abstraction and encapsulation. As software systems become increasingly complex, ably designed system services provide a valuable tool to rapidly develop reliable software systems.
Steven Stolper is a software engineering manager at Broadcom's Carrier-Access Business Unit. Prior to Broadcom, he helped develop embedded IP-over-satellite networks. The victim of far too many “Star Trek” episodes, Stolper also designed flight software for NASA planetary spacecraft including the Mars Pathfinder Lander and Galileo Orbiter. His e-mail address is .
- Figure 1: Role in system architecture
- Figure 2a: Non-volatile memory manager
- Figure 2b: A simple implementation
- Figure 2c: Add a semaphore
- Figure 3: Interrupt safe and non-blocking
- Figure 4: Network time service
- Figure 5: Callback architecture (library) timer service
- Figure 6: Callback architecture (asynchronous) relay thrower
Return to April 2001 Index.