Using design patterns to identify and partition RTOS tasks: Part 2 - Embedded.com

Using design patterns to identify and partition RTOS tasks: Part 2

The problem with desynchronizing your code by moving parts of it into different tasks is that many of the resources your software wants to use—certain hardware, data structures in memory, files—can’t be accessed asynchronously by multiple tasks in a safe way.

For example, if one of your tasks is in the middle of updating a linked list when another of your tasks gains control of the CPU and tries to read along the list, your system is likely to crash when the reading task follows a not-yet-updated pointer into garbage memory. This brings us to the second major task design pattern: Task Patterns for Synchronization, which provide ways to serialize access to shared resources.

Probably the most common—and perhaps also the most abused—way to deal with shared resources is to associate a mutex with each resource, declare everything global, and then take and release the mutex whenever and from wherever the resource is required, as shown here in Figure 2:

Figure 2.Using a Mutex to Protect a Shared Resource

There are several difficulties with this:

1) Mutexes work only if you use them correctly every time; that is, whenever any code anywhere in the system accesses the shared resource, the access must, without fail, be surrounded by code to take and release the proper—and not some other—mutex. Otherwise, mutexes lead to a host of subtle, infrequently appearing, hard-to-diagnose bugs in your system. When you use mutexes, every modification to your code presents another opportunity for another of these insidious problems to creep in.

2) Mutexes can affect response time in unpredictable ways that can cause your system to miss deadlines sporadically. Your high priority task always meets its deadline…unless it just happens to try to take a mutex right after some low priority task has taken it and started to use the shared resource. These bugs can also be extraordinarily tough to find, since they seldom cooperate by showing themselves when you have your test equipment set up to find them. Though your RTOS may take care of this priority inversion problem2 for you—if your RTOS supports this feature—you must be willing to take the performance hit that this feature incurs.

A task whose job it is to handle a shared resource is a task pattern for solving this problem. The basic synchronization pattern looks like the one in Figure 3:

Figure 3. Using a Task to Protect a Shared Resource

In the pattern above, mySharedResource is a data structure that many other tasks need to access in some way or other. As you can see in the sample, the MySharedResourceTask module has declared this data structure static, thereby encapsulating it and preventing undisciplined access by other code in the system. The mySharedResourceTask is declared to be a task within the RTOS; it calls RTOSQueueRead to wait for requests from other tasks (through an RTOS queue in this example, although many other RTOS mechanisms can do this job) about what needs to be done to mySharedResource. Then it does the operation and then waits again on the task’s request queue.

Note that all of the operations on mySharedResource, being done now within the context of mySharedResourceTask, are done sequentially; they are serialized in the task’s message queue. The mutex and all of the attendant problems are gone. Although the task’s message queue is now a shared resource, the burden is on the RTOS, not the programmer, to synchronize access to it.

Figure 4 has a more concrete example, a common variation of this pattern we call the Hardware I/O Pattern in which the resource to protect is a piece of hardware, in this case a display:

Figure 4. A Task to Protect an I/O Device, such as a Display

In a system to control a pay phone, for example, various parts of the code may simultaneously think it would be a good idea to show the user what number he’s called, how long he’s been on the phone, an advertisement for the phone company’s new rates, and an indication that his phone card is about to run out.

If the display can’t display all these things simultaneously, it makes no sense to allow all those parts of the code to fight over the display. Instead, the logic to decide which of these suggestions should actually get displayed and to handle the display hardware goes into the task in Figure 4. Other parts of the code send their suggestions to this task.

This eliminates the need for a mutex, resolves concerns about sharing the display hardware, and confines to this one module all of the logic for determining what is most important to display rather than scattering it throughout your system. Figure 5 is a similar example, in which the shared resource is a flash memory:

Figure 5. A Task Writes to a Log in Flash Memory

In Figure 5, WriteLogToFlashTask writes to the flash and then blocks itself while the flash recovers. If other code in the system finds other events to log during the recovery period, they write their logging requests onto the RTOS queue, and the requests wait on the queue until WriteLogToFlashTask has finished with previous requests, and the flash has had a chance to recover. Information about when it is OK to write the next log entry into the flash doesn’t get spread around the system, and you will not have to code logic in the other tasks to deal with the recovery time.

Responses from Synchronization Tasks
Synchronizing tasks fulfill a role much like a server, offering the services of the shared resource to the other tasks, the “client” tasks, within the system. One problem that must be resolved is that client tasks very often want a response from the server.

In the examples in Figure 4 and Figure 5, clients needed no response. If a client task sends an item for the log to the logging server task in Figure 5, for example, the behavior of the client task most likely will not depend upon when (or even if) the entry gets written into the log. In the pay phone, most of the software probably doesn’t care what eventually gets displayed: the payment software will likely disconnect the call when the user runs out of money, whatever the display task has actually shown the user.

Often, however, the client task needs a response to its request. Sometimes a task needs this response before it can continue its own processing, a synchronous response. In other situations, the server task may not be able to provide a response immediately and the client task doesn’t want to block waiting for the response. So the client task must somehow get its response later, an asynchronous response.

Synchronous Response Pattern. A task asking a calibration subsystem to get a value may need that value before it can continue its calculations: it needs a synchronous response. Almost invariably, using a mutex is the easiest way to deal with this requirement, but that does not mean that you should simply make the calibration data global and let any task that wants a calibration value read it. Instead, encapsulate access to the data as shown in Figure 6:

Figure 6. A Calibration Task

This code keeps a shadow of the calibration values in the sCalibration structure. When some part of your code wishes to write a new calibration entry, vCalibrationWrite puts the new value into sCalibration (using the mutex that protects that structure) and then sends a message to vWriteCalibrationToFlashTask to write the new calibration value into the flash.

When some part of your code wishes to read a calibration entry, iCalibrationRead fetches the new value from sCalibration (again, using the mutex). This code thus provides immediate response for tasks that need entries from the calibration, although any task that calls either vCalibrationWrite or iCalibrationRead must be able to wait for the mutex without missing deadlines.

Asynchronous Response Pattern. As an example of a task that requires an asynchronous response, consider a task that needs to move a motorized mechanical assembly some distance in a certain direction, but that needs to continue running the rest of the factory while the mechanical assembly is busy moving. If the software in this system contains a separate server task to handle the operation of the mechanical assembly, then this server task needs to provide a service that allows clients to request a move but not block waiting for the move to complete. When the move completes, then the server task notifies the client task.

Since it is common for tasks to read from an RTOS queue to get information about what is going on in the outside world, one convenient way to provide an asynchronous response is to have the server task send a message back to the client task’s message queue when the response is available. Some code to do that is shown here in Figure 7:

Figure 7. A Server Task Provides Asynchronous Response

When AsynchServerInit is called to initialize the server task, the client task’s q and msg are remembered for later use. When the client task calls vStartProcessNotifyWhenDone, the routine writes a message to the server task’s queue asking it to start the “process”.

When the server task reads the message, it starts the “process”. Later, an ISR or another task sends a MSG_PROCESS_DONE message to the server’s task, signaling that the “process” is finished. At this point, the server task sends an asynchronous response to the client task, using q and msg, to notify it that the “process” is finished.

A more general way to accomplish the same result is to have the server task call a callback function in the client task instead of passing a message to a queue. Instead of a queue and a message in this case, AsynchServerInit takes a callback function pointer and perhaps a context value to pass to the callback function. The disadvantage of this method is that you often end up writing a whole raft of functions in your client tasks something like the one shown in Figure 8.

The advantages are that the server task has no access to the client’s queue, thereby encapsulating the queue and reducing the chance of bugs related to the queue; and that the client task may be able to handle the event in the callback function without having to incur the overhead imposed by using a queue.

See also Part One: Task patterns for desynchronization.

Michael Grischy is one of thefounders of Octave Software Group, a software development consulting firm. David Simon, also a founder,has recently retired from Octave Software.

References:
1) “Design Patterns for Tasks in Real-Time Systems,” Class ETP-241,Spring 2005
2) “Patterns and Software: Essential Concepts and Terminology,” by BradAppletonhttp://www.cmcrossroads.com/bradapp/docs/patterns-intro.html3) “Design Patterns: Elements of Reusable Object-Oriented Software,” byErich Gamma, Richard Helm, Ralph Johnson, and John Vlissides
4), “Pattern-Oriented Software Architecture: A System of Patterns,” byFrank Buschmann, Regine Meunier, Hans Rohnert, Peter Sommerlad, andMichael Stal

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.