Model complex behavior the domino way - Embedded.com

Model complex behavior the domino way

Change happens; detect and use it to cause more change. Here's a method that harnesses the power of change to model complex systems behavior.

Software quality is an ever-present concern in the world of engineering. Tools and methods are available to expedite the modeling, design, and implementation processes. Yet, despite the assumed complexity, software is not wizardry. It's simply a collection of variables that change over time. Although tools and methods are basically designed to manage these changes, this commonality is concealed by high-level concepts.

One such method is based on the concept that “change happens.” Just like a chain of falling domino blocks affect one another, state changes in variables can be used to cause changes in other variables. The event that causes a state change is the equivalent of the force that causes the first domino block to fall. I'll refer to this as the Domino method, and I'll illustrate it by modeling an asynchronous, event-driven controller.

With the increasing complexity of software, modeling tools are gaining in popularity. I've used UML-based tools quite a bit, especially statecharts, which model the behavior in objects. In statecharts, events trigger a transition from state A to state B. This transition is executed as run-to-completion, which means that it can't be interrupted by another event. However, in complex systems, such as asynchronous and real-time systems, the state machine may transition to a multitude of states. Choice points for example can be used to find the next state. A choice point represents a decision to be made and is triggered by the transition leading toward it. Which choice point is chosen depends on the Boolean result value of a logical expression that is associated with that choice point.

Based on my experience, complex systems must handle every event in every state and can typically transition from any state to any other state. This results in many choice points, spread out all over the statechart, each containing a little piece of the logic puzzle. As an initial observation, it would be advantageous to have all the logic in one place.

Another observation is that statecharts seem to have assumptions that aren't clearly visible in the model. When changes are made to the model, it's easy to break the assumptions, thereby introducing bugs.

And finally, to ensure the design's completeness, I must mentally go through the model, visualize every possible event and how it affects the state machine and local variables to get an idea of whether it will work and correctly handle every possible scenario. Or, I have to test every possible use case to see how the system reacts. There must be a better way to systematically handle every scenario.

Let's start with defining the model problem. Suppose we have to implement a controller process that initializes a second process by sending it configuration data. Events inform the controller about the status of this process (up or down). When the controller receives a setup request with valid configuration data, it will forward this data to the process that needs to be initialized, provided that it's up and running. New setup requests may come in at any time, with different data, and to prevent queuing up the requests at the other end, the controller holds the data and doesn't send another init (short for initialization) request until an init response is received. When the process has been initialized, a (successful) setup response will be sent back. If anything goes wrong, a failed setup response is returned.

A statechart is shown in Figure 1. The transition code isn't shown but we'll discuss it. The up event is handled on the boundary and sets a local variable processState to true, while the transitions handling the down event set this variable to false. When a setup request is received in the idle state, choice point C1 will check this variable to determine whether the process is up. It will also check whether the data that came with the setup request is valid. If both conditions are met, the transition continues to the waitForInitRsp state. Otherwise, the state machine transitions back to idle and returns a setup response with a failure indication. That same response is sent when a down event is received in the waitForInitRsp state.

View the full-size image

Why is there a variable for the process state but not for the setup request? It's because the up event represents information that should be remembered for later use, while the setup request can be handled immediately. When a down event is received in the waitForInitRsp state, the state machine transitions to the idle state and sends a failed setup response. This can only be justified by the fact that the transition from waitForInitRsp to idle sends back a setup response that can only be justified by the fact that the waitForInitRsp implies that a setup request was received earlier because it's the only way to reach that state. The state chart contains and uses this information.

It may seem odd, but I could add a variable setupReceived to remember the setup request. It's set to true when the setup request is received and checked in C1 as follows: if (processState == true && setupReceived == true) then continue to waitForInitRsp. Then, add the current state: if (state == idle && processState == true && setupReceived == true). After all, C1 can only be reached from the idle state. When this check equals true, the state machine can transition from idle to waitForInitRsp.

This choice-point code is triggered by an event, but why couldn't the transition be initiated by the rule itself when this rule becomes true? The rule becomes true when the variables in the expression change to a state required by the expression. Events already modify the variables and cause a state change, so use this “state changed” as a trigger to check the rule.

By converting all choice points like this, you'll end up with a set of rules, each describing a specific combination of variables/states. To make it complete, the rules also describe combinations of states that don't cause a transition. A table is used to systematically describe every combination. Because the rules no longer have anything to do with events, they're checked whenever a state change occurs.

To summarize, this mechanism still begins with an event being handled by a state machine. But unlike regular state machines, where reaching the destination state means the event has been handled completely, in this method a state change is detected and used to generate the “momentum” to check a set of rules. These rules can cause new transitions, causing the rules to be checked again, until a combination of states is found that doesn't cause a state change (stable state). All this is still subject to a run-to-completion concept, though it's more of a run-to-stable-state principle. Event handling is thus broken up into multiple small steps.

In the new model shown in Figure 2, there are now six state machines, all representing information that determines the behavior. Note that this model uses a generic, non-UML state-machine notation. The three state machines at the top all handle events: requestState handles the setup request, processState handles the up and down events, and initState handles the init response event. The event-based transitions are shown as solid lines. The transitions are shown as dotted lines, for example from notInitialized to waitForInitRsp in the initState are initiated by the rules I'll discuss later.

View the full-size image

These three state machines report state changes to the table; the reporting is depicted in Figure 2 as a line with one or two arrows connected to the table. A tag > next to this line shows the reporting feature. An arrow pointing from the table to a state machine indicates that the table influences this state machine by causing state transitions.

A table, such as depicted in Table 1, helps describe every possible combination of the three state machines. The table shows 12 unique rules with optional “action” code and next state(s). Remember that checking this table is triggered by any state change. The table doesn't know which state actually changed; it simply tries to match the current overall state situation with a rule.

View the full-size image

The initial state is given by {idle, notInitialized, down}. For example, when an up event is received, the processState transitions to “up.” This state change is reported to the table and as a result, the table tries to match the new overall state {idle, notInitialized, up} with a rule. It matches Rule #2 in Table 1, but this rule doesn't cause a new state change.

While in this state, a setup request will move the requestState to setup. The new combination {setup, notInitialized, up} matches Rule #8 and specifies that it's time to start initializing. The next state will be waitForInitRsp, and this rule will execute the sendInitRequest function that sends the configuration data. The state change to waitForInitRsp is a trigger to check the table again. However, this time, it matches Rule #10, which does nothing.

Rule #12 specifies a scenario where initialization has finished: the setup request is still pending, the process is still up, and because the initState is in the initialized state, initialization is complete. Rule #12 transitions the requestState back to idle and calls the function sendSetupResponse with a parameter indicating success.

The domino effect is clearly visible in Rule #9. This rule initiates a state change that causes Rule #7 to be triggered, which triggers Rule #1. The sequential execution of these rules determines the overall response to a particular event.

The > label represents a combination of states that the system is never supposed to be in; the label can later be helpful for debugging purposes. Finally, the table can be made smaller by introducing the “*” notation in a state column indicating that it matches any state. For example, Rules #1 and #2 can be combined into one rule by adding an * in the processState column.

Adding more info
Table 1 isn't complete yet because it doesn't incorporate all available information. For example, it doesn't check whether the data is valid. The three state machines named dataState, differenceState, and usedDataState in Figure 2 map the configuration data to states that are important for the behavior.

The first state machine, dataState, represents whether the configuration data is valid or not. That state machine has a local variable configData associated with it (shown in the box underneath the state machine), as well as the states valid and invalid. These states are updated every time the configData changes value. This value is the setup request's event data, and is therefore set from within the transition idle to setup in the initState. Figure 2 shows this as a line and arrow and > next to it.

The dataState doesn't report a new state change to the table, but its state is used by the table (Table 2 shows the updates). The dotted line labeled > symbolizes this. However, the dataState reports a new value to the differenceState. This state machine compares the configData with another local variable (usedData) and represents whether these two variables have the same value or not. The usedData variable is set from the transition notInitialized to waitForInitRsp in the initState as this is where the configuration data is sent to and consequently, used. The state associated with the usedData variable isn't important and thus, isn't connected to the table. Finally, the table must be extended with two state machines and their four new states: valid, invalid, diff, and noDiff.

Rather than quadrupling the size of Table 1 by combining four new states with every rule, we can save time by considering when these new states would be needed. Table 2 shows how states can be consolidated.

The validity of the data is only important when an init request needs to be sent, which is covered by Rules #8 and #12. As Table 2 shows, Rule #8 will be split up into 8a and 8b: when the data is invalid, Rule #8a specifies that the initialization can't continue. Rule #8b describes what to do when the data is valid and this is basically the original Rule #8 from Table 1. Rule #12 specifies that the process has been initialized, but while waiting for the init response, the configData has changed. Rule #12a specifies that the configData is now invalid and sends back a failed setup response while moving the requestState back to idle. Rule #12b specifies that the configData is valid, but because the differenceState is noDiff, it's not different from the previously used value. This rule sends a successful setup response and moves the requestState back to idle. And finally, Rule #12c specifies that the configData is valid and different from the previously used value. As a result, it sends another init request and moves the initState back to waitForInitRsp.

Note how Figure 2 shows that everything is truly based on change. Whenever the value changes for the configData variable, the associated state is re-evaluated. Instead of modeling this as two different entities where the variable reports a change to the state machine, it's modeled as one entity. This is done for convenience, because that's what concepts are all about.

Changing the rules
Even though the events are basically hard-coded to the state machines (it makes no sense to put them anywhere else), it's the rules that determine the behavior and it's easy to modify them without touching the state machines. For example, Rule #7 specifies that the initialization can't start because the process is down. Currently, this rule specifies that the requestState transitions back to idle, and a setup response is returned. If the controller wants to do something else, say wait until the process is up, Rule #7 can be changed to do nothing.

Starting from this state, initialization starts when the up event is received. If the controller wants to wait for the up event, but not too long, a timer can be started. When this timer expires, it will cause the requestState to return to idle as shown in Figure 3. A separate state machine handles the timeout event and reports a state change to a new table. This new table would only have one rule that causes a state change in the requestState. This rule is triggered when the timerState is in the timeout state and the requestState is in the setup state and moves the requestState to idle. The timer is started by Rule #7 and should be cancelled when the processState equals up (Rules #8a and #8b). When the requestState transitions back to idle state as an indirect result of a timeout, it calls the sendSetupRsp function and reports this state change to Table 2.

View the full-size image

If you think this is a lot of work for such a small problem, consider that Table 2 doesn't contain any superfluous rules. As I mentioned earlier, you can use a statechart to implement this controller but you're more likely to forget a specific use case. I'm sure that after many test-fix cycles, you'll eventually end up with the same rules and logic, distributed over the many choice points.

Something must change
Reliable systems need timers, especially when communicating with other processes. The controller should never be perpetually stuck when a process doesn't reply with an init response. The timer state machine introduced earlier can be used as a remedy. From a modeling perspective, the same timer object shown in Figure 3 can be used. Of course, the implementation will need a different instantiation of this timer, which reports any state changes to the initState.

A third table is used to connect the timerState with the initState. Once again, the only combinations of states that triggers a new state change is when the timerState is in the timeout state and the initState is in the waitForInitRsp state. This rule will move the initState to the notInitialized state. This state change triggers the rules in Table 2 and sends another initialization request. This may be correct behavior, but it also shows a side effect of the static state checking that occurs in the table.

The table doesn't know which state actually changed, what event was received, and in particular, the table (or the model) doesn't know about failure. There's no information that could influence the behavior to do something other than sending a new init request.

Adding failure information is done by adding an error code representing the kind of error that last occurred. In this model, of course, it's associated with a state machine with error (error codes > 0) and noError (error code = 0) states, and some rules will need to check it, namely Rules 8a, 8b, and 8c.

A new version of Rule #8 is shown in Table 3 (Rule #8c), which detects that an error has been set, and decides that the initialization can't continue. The error code is reset and a setup response is sent.

View the full-size image

Things to be aware of
As always, the law of conservation of trouble ensures that there's never a perfect way to do things. Using this mentality, the following dangers should be monitored:

  • because rules may trigger other rules in the table due to state changes, care should be taken to prevent infinite loops;
  • every rule in the table must be unique, where no specific order is assumed between the rules;
  • the > and > attributes should be used correctly to ensure deterministic behavior: state machines that handle events typically report changes, while data-related states never directly report to a table (but they could report to other state machines).

Note that a computer could easily check all of these constraints. In addition, the computer can also check whether the > states are really unreachable, how many other states are unreachable, whether there are dead states, etc. It could even generate the code for you.

Implementation
Here's how the Domino method can be implemented. The following code segments show how functions handle events, but how these functions are called won't be covered. A setup request is handled by the following code:

void handleSetupRequest(DataType&   newData){   setData(newData);   if (requestState == idle)   {      requestState == setup;      checkTable();   }}

The handleSetupRequest function first calls setData (shown in Listing 1) that assigns the event data to the configData variable, and calls the function checkDifferenceState. When that function returns, the dataState is determined: either valid or invalid, as Listing 1 shows.The checkDifferenceState function simply compares the variable configData with the variable usedData and sets its state accordingly. The differenceState doesn't report a state change to the table, so these functions will all return and we're back at the handleSetupRequest function. If a state change is detected (from idle to setup), it's reported to the table. The function checkTable in its most inefficient, but most straightforward form, is:

void checkTable(){   bool continue = true;   while (continue)   {      continue = false;      // add all the rules          here   }   return;}

The while loop is needed as one state change is a trigger to check the table again. Rather than calling checkTable (recursively) after a state change that occurs within this function, it simply continues to loop; rules that don't cause a state change should bail out by returning from this function. If no rule can be matched, the function will return as well, but it's wise to report an error when this occurs. The rules themselves can be implemented as if-statements; the code in Listing 2 shows examples of Rule #1, #2, and #8b from Table 2.Because the dataState doesn't report a state change to the table, its value is only important when the table actually needs it. It's not really necessary to implement it as a state machine. Rule #8b could just as easily be implemented as shown in Listing 3.Again, states like dataState are nice to have in a model to show that they're part of the behavior, but they can be left out from the actual implementation. A code generator could be much smarter here and substitute state machines with regular checks where needed. It could also determine that Rule #8b for example transitions to a state combination that doesn't have a state change associated with it. Rule #8b could then simply return from the checkTable functions instead of looping one more time and consuming valuable CPU cycles.To finish up the implementation section, a few final remarks:

  • The rules with > tags in Tables 1 and 2 could report this as an error for debugging purposes;
  • Because the table is stateless (it only uses the current state information and has no table specific local information), different classes/objects/modules can implement different sets of rules, which can be switched out even at run-time, enabling for very dynamic behavior.

I wrote a test program that injected every possible event into the controller while registering all the state combinations that were transitioned to, along with the event that caused a change. On a 1-GHz machine, it only took 0.1 seconds to evaluate the complete state space. This state space was used to find state combinations like {error, idle}, which means that the error code wasn't reset properly. An event trace leading to this state can help the designer find and fix the problem. I've posted more information about this at www.dominosoftware.net.
Link to regular code
Many development tools, concepts, and methods are available nowadays, and I believe that they all deal with managing change because that's what software is all about. The method described in this article is driven by change, so why couldn't it be used to build any software product? I talk a lot about state machines, but like everything else, a state machine is just a concept, a variable with a specific set of discrete values. A computer program is nothing but a collection of variables whose values evolve in time. The behavior is determined by the if-statements effectively dividing the value ranges of parameters into discrete sections (or should we call them states?).

What about events? Looking at the implementation examples given, nothing indicates an event-driven asynchronous system. It's just a function call that's modifying data. Code like that can be found in any software module. The point is, “Why can't variables respond to change, caused initially by a function call?” The tables completely describe every state combination and represent the dependencies between all variables in a system, while the rules represent the if-statements. The following is a small, typical function:

foo(){   x = A();   if (x > 0) B()   else C();}

This looks similar to a one-step state transition with a choice point. Everything must be done within the context of the “active” function call. Another way to implement it is:

foo(){   A();}

For the linearly thinking mind, it seems easier to implement a function like an algorithm (step by step), but the change-driven version of foo() is simpler. Depending on how A changes the local variables, x, B, or C will be executed, triggered by an assignment to x. They'll respond to the changes caused by A. This new version shows programming the domino way: one change causes another change. That would be a totally different paradigm, perhaps similar to the one Mark Bereit described (“Escape the software development paradigm trap,” Embedded Systems Design , May 2006, p. 11).The following example shows how easy it is to ignore the fact that everything is basically a collection of state machines that must be controlled. I once had to implement a fault manager that maintained a set of faults. If no faults were active, the set was empty and the system received a start request. If one or more faults were active, the system was stopped using a stop request. This is how it started:

addFault(int id){   myFaultSet.add(id);   if (myFaultSet.size() > 0) sendStopRequest();}removeFault(int id){   myFaultSet.remove(id);   if (myFaultSet.size() == 0) sendStartRequest();}

Adding fault after fault, the system will receive multiple, unnecessary stop requests. A similar problem occurs when removing a non-existing fault from an empty set. This causes a start request to be sent (again). This might not seem like a big problem, but things should be perfect, right? I could add logic to prevent sending multiple requests, but if a third function modifies the set, the logic must be duplicated. Until you realize that the stop and start requests are associated with the fault set going from the states being empty to not-empty and back. The behavior doesn't care about which elements are in the set, but it's interested in whether there are no elements, or at least one. The state machine associated with the fault set should control when the start and stop requests are sent. That reduces the addFault function to simply adding a fault to the set.

Embracing change
The Domino method described in this article can be summarized as: change happens; detect it and use it to cause more change. It enables the model to be broken up into smaller, simpler state machines, each handling a set of related events. Rules not only describe every possible combination of states, but they also initiate state changes. And because the rules are triggered by state changes, multiple iterations can occur before a stable state combination has been reached. Thus, event handling is broken up into multiple, smaller steps. By considering regular code as a collection of state machines, that too could benefit from this method.

It would be interesting to research the possibility of reverse engineering legacy code or statecharts into tables to show the absence of rules and therefore, the presence of bugs. Then, the bugs are fixed and converted back to code or statecharts.

How about models that have many tables interacting with each other, and how do we ensure deterministic behavior in that case? It would be beneficial to create a graphical interface to make the modeling even more accessible. Combined with the model-checking features described here, it would make for a pretty powerful tool.

Jos Dijkstra has a masters in mechanical engineering from the University of Groningen, Holland. He has nine years of software engineering experience and currently works for Aztek Networks in Boulder, Colorado. You can reach him at .

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.