How to ensure the best Qt state machine performance - Embedded.com

How to ensure the best Qt state machine performance

If you use Qt for application development and if you use state machines, then it is likely that you are making use of the Qt state machine framework. So you will define the state machine using plain C++ or SCXML. An alternative approach is generating C++ code from state machine diagrams. This article compares these approaches, taking functionality, applicability, and performance into account.

I bet that as a software developer, you have already implemented tons of more or less complicated switch-case statements. This is at least true for me, and much of this switch-case coding was essentially nothing but implementing diverse state machines. This is the easiest way to start programming state machines if you have nothing else at hand than the programming language of your choice. While the start is easy, such code becomes less and less maintainable as the complexity of the state machine increases. At the end, you will be convinced that you don’t want to continue implementing state machines manually this way. (By the way – I am assuming that you know what state machines are.)

Implementing state machines

There are different alternatives to implement state machines. One of the better ways – especially when you are using an object-oriented programming language like C++ – is applying the state pattern. This approach uses state classes and often also transition classes. A state machine is then defined by creating instances of state classes and wiring them using instances of transition classes. In this case, a framework helps a lot to reduce code size and implementation effort.

The Qt state machine framework is a good example. This API allows you to “configure” a state machine using compact code. You don’t have to care about details of state machine execution semantics, as these are already implemented by the framework. You still have to write code, and as your state machine becomes more complex and contains some dozens or even hundreds of states, it will become really hard to get an overview. A picture is worth a thousand words, and the well-known concept of state diagrams helps to overcome this constraint. Qt itself provides support for State Chart XML (SCXML), which is a W3C standard. As writing XML by hand is no fun,Qt Creator also includes a simple graphical statechart editor.

Irrespective of the concrete implementation approach, using a graphical syntax is the best choice for editing and understanding state machines. Such graphical models can not only be textually represented by languages like SCXML but can also be used to generate any kind of programming language source code, like a switch-case-based state machine in plain C++ – or C++ code that sets up instances of QStateMachine. Using a tool that does such a transformation for you, you can avoid the pain of hand-writing state machine code. It lifts all three implementation approaches onto the same usability level. Nevertheless the implementations are still fundamentally different. This article is about comparing their runtime behavior and especially their performance.


UnsplashPhoto by Austris Augusts on Unsplash

The competitors

So what about performance? How do the available approaches differ regarding required CPU cycles? To get some concrete numbers, I set up a performance test suite. The first part compares the different implementation strategies. These are the competitors:

  1. SCXML interpreter – the test state machine is defined using SCXML and executed by Qt’s QSCXMLStateMachine class.
  2. State pattern – the test state machine is implemented using QStateMachine classes.
  3. Plain C++ code – the test state machine is implemented by a C++ class which applies a basic switch-case-based approach.

Note: Code for these examples can be found here.

The first two variants imply the use of Qt concepts like signals and slots as well as the use of a Qt event queue, while a plain C++ implementation does not require this infrastructure. In order to make the approaches more comparable, the test suite includes two more test scenarios:

  • Plain C++ code with signals and slots – the test state machine has the identical implementation as described above but uses signals and slots to integrate it into the application.
  • Plain C++ code with QEvent – uses the plain C++ code approach but makes use of the Qt event queue for processing in and out events.

This makes it possible to compare the impact of the use of signals and slots on the one hand and the use of QEvents on the other hand compared to the plain C++ implementation, since the state machine execution code is identical in all cases, but just wrapped differently.

The test state machine

For testing all five competitors, I defined the state machine shown in fig. 1 for the basic test scenario.


Figure 1: The test state machine, as created with YAKINDU Statechart Tools. (Source: Author)

The test state machine is a simple flat state machine. It defines six states A to F and cycles through the states. Two input events e1 and e2 are defined, which alternately trigger state transitions. When a state transition takes place, also a simple action is executed. Each transition action simply adds 10 to a statechart variable named x. The transition from state F to A additionally raises (or emits) the out event o.


Figure 2: The test state machine as an SCXML model in Qt Creator. (Source: Author)

This state machine was defined using YAKINDU Statechart Tools, which supports the generation of SCXML. This SCXML can be added to the Qt project and can be edited in Qt Creator. As you can see in fig. 2, the state machine has the identical structure as the one in fig. 1, but some details, like the transition actions, are not visible in Qt Creator. YAKINDU Statechart Tools provide some more advantages, but I won’t discuss them here.

More important here is the fact that YAKINDU Statechart Tools can also generate plain switch-case-based C++ state machine classes. It also provides an option to generate Qt-enabled classes with signals and slots, so this comes in handy. Using this tool, I only had to implement the state pattern-based state machine using QStateMachine by hand. There was no code generator available for that variant. Nevertheless, I was able to save lots of implementation effort, while getting semantically equivalent state machines for the performance tests just by using a single statechart definition.

All test cases follow the same scheme. As I wanted to measure the average time taken to process individual events, each test captured a million iterations of a single state loop. Each state loop performs all events necessary to visit all states and processing all transitions and transition actions. So, a state loop starts and ends with state A being active. This means that for each test case 6 million in events and transition actions and 1 million out events with their associated transition actions are processed. The tests were executed as a command-line application and recorded the time for the iterations as a single batch. The time consumption per event can then simply be determined by dividing the measured time by the sum of the number of in events and out events. The tests were performed several times and the measurement results with the lowest values were chosen.

The tests were performed using optimized code without debug information on my old (mid 2014) MacBook Pro, with Core i7 Quad Core CPU 2.4GHz. Of course, the concrete numbers will differ on different machines and operating systems. However, this is not relevant, as I wanted to compare the different implementation approaches relative to each other. These relative differences will be comparable on different hardware and OS platforms.

Let’s take a look at the performance figures

Yes – I think almost everyone would have expected a plain C++ implementation to be faster than the other alternatives, but the magnitude of the differences is really astounding.


Figure 3: Single event processing time compared. (Source: Author)

Processing single events using plain C++ took an average of 7 nanoseconds. Using SCXML required 33,850 nanoseconds – that is a factor of about 4800 and a huge difference! For comparison, light travels more that 10 kilometers while the SCXML state machine processes only a single transition, while the same transition in the plain C++ state machine only leaves so much time for light to travel just over 2 meters. This implies very different orders of magnitudes for CPU cycles and energy consumption.

Of course the concrete numbers depend on the machine and on the test procedure that was used. I will discuss this topic later. But let’s discuss the other numbers first. The first three test scenarios all include an identical state transition logic, which was generated by YAKINDU Statechart Tools, but each wrapped up in different ways.

Using signals and slots for handling events took 72ns on average when using direct connections. So this mechanism imposes a minimum overhead of ~90%, compared to the actual state machine logic. At this point, I don’t want to argue that using signals and slots makes applications slow. Instead I would rather claim that plain code implementations of state machines are extremely fast.

Comparing this with the third scenario gives a good impression of the performance overhead caused by using the event queue. In this scenario, all statechart events are routed through the event queue. With 731ns per event, it takes a factor of ~10 compared to signals and slots and ~100 compared to plain C++.

We can assume that a comparable overhead also applies to the other two scenarios “plain QStateMachine” and “SCXML state machine” – they both require an active event queue. So, when the assumed event queue overhead is subtracted from the 5200ns per event, then we get a rough time consumption of the QStateMachine framework of 4500ns per event. Compared to the plain code approach, QStateMachine-based state machine implementations are slow.This is a factor of about 635, compared to the plain C++ code implementation.

Finally, let’s have a look at the SCXML interpreter. It involves interpreting JavaScript code and adds another factor of ~7. Compared to the plain code approach, SCXML-based state machine implementations are very slow.

What about hierarchical and orthogonal state machines?

So far, I just profiled a simple flat state machine. But statecharts provide much more features, and the two most important structural features are hierarchy and orthogonality. So, what impact does the use of these features have in respect to state machine runtime?

First, to measure the impact of hierarchies, I defined a hierarchical variant of the state machine to be profiled, shown in fig. 4.


Figure 4: Hierarchical test statechart. (Source: Author)

It provides exactly the same behavior as the flat state machine, but adds some composite states. Keeping the functionality identical, but just changing the structure, makes it possible to find out how much overhead the structural variant implies, if any.

Second, in order to measure the impact of orthogonality, I replicated the flat state machine in the form of four orthogonal regions. They all have exactly the same functionality. So, the resulting state machine (see fig. 5) will do four times the work the simple state machine does.


Figure 5: Orthogonal test statechart. (Source: Author)

For profiling, I chose the plain C++ and the SCXML implementations, because these were the fastest and slowest variants. The diagram in fig. 6 shows the results. It is very encouraging that using hierarchies in statecharts do not have any measurable performance impact in both implementation variants.


Figure 6: Performance impact of hierarchies and orthogonality. (Source: Author)

Another positive result is that using orthogonality also does not have any negative impact. On the contrary, while one might have expected at least four times the processing time to accomplish four times the work, the effective increases in runtime with factors ~2.4 and ~3.1 are significantly smaller than 4.

Why is this the case? The reason for this is that there is a general part of state machine processing which is independent of the processing of individual states and events. This part takes 52% (or 3.5ns per event) for the plain C++ state machine, compared to 28% (or 9300ns per event) for SCXML. Finally, orthogonal states have less impact when using generated C++ code compared to SCXML.

Conclusion

Plain C++ is by far more efficient than all the alternatives. Using signals and slots or the Qt event queue are framework mechanisms easing implementing and maintaining complex state machine applications. The Qt state machine framework requires both these mechanisms. Using generated C++ code, you have the choice.

In many scenarios, especially interactive ones, even SCXML state machines are fast enough, and they may provide more flexibility by making the behavior configurable by switching statechart definitions at runtime.


Axel Terfloth has been working as a software engineer at itemis in Lünen/Germany since 2006. His special interest is on model based software engineering and the using state machines for developing applications for different embedded hardware and software architectures. He is leads R&D for embedded systems and is responsible for the YAKINDU Statechart Tools.

 

1 thought on “How to ensure the best Qt state machine performance

  1. You left off the QP Framework from Miro Samek. Also, depending on the embedded system Boost::StateChart or Boost::MSM or even the new Boost::SML (experimental, but suitable for even small micro-controllers) would be options.

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.