Why and how to measure your RTOS performance
Desktop or laptop computers are extremely powerful and amazingly low cost. This means that developers of software for desktop systems assume that there is infinite CPU power, so they worry very little about the speed of their code. They also assume that indefinite amounts of memory are available, so they do not worry about code size either.
Embedded systems are different. Typically, there is enough CPU power to do the job, but only just enough – there is no excess. Memory size is limited. It is not normally unreasonably small, but there is unlikely to be any possibility of adding more. Power consumption is usually an issue and the software – its size and efficiency – can have a significant bearing on the number of Watts burned by the embedded device. It is clear that, with an embedded system, it is vital that the RTOS has the smallest possible impact on memory footprint and makes very efficient use of the CPU.
Selecting an RTOS for an embedded system is quite a complex process and making the right decision is critical. There are broadly three approaches:
- You can develop your own kernel, though this is rarely a financially sensible choice.
- The world of open source software provides some options, although, if you need real time performance, you are somewhat restricted.
- You can choose a commercial product.
There are of the order of 200 RTOS products on the market and choosing the right one for a given application is tough. The parameters that drive the choice are varied and to guide you down that path would take a complete paper in itself.
A key problem is that there is no real standardization. One possibility would be the Embedded Microprocessor Benchmark Consortium, but that is not widely adopted and, anyway, it is more oriented towards CPU benchmarking.
However, one aspect to focus on is the performance of the RTOS. All vendors publish figures and will answer questions. The skill that the potential RTOS user needs to develop is the ability to ask the right questions and to understand the answers. Comparing like with like is a challenge.
RTOS Metrics. There are three areas of interest if you are looking at the performance and usage characteristics of an RTOS:
- Memory – how much ROM and RAM does the kernel need and how is this affected by options and configuration.
- Latency, which is broadly the delay between something happening and the response to that occurrence. This is a particular minefield of terminology and misinformation, but there are two essential latencies to consider: interrupt response and task scheduling.
- Performance of kernel services. How long does it take to perform specific actions?
Each of these measurements will be addressed in turn in terms of 1) the metrics to be used, 2) dependencies to watch out for, 3) importance in your design and 4) pitfalls to avoid.
As all embedded systems have some limitations on available memory, the requirements of an RTOS, on a given CPU, need to be understood. An OS will use both ROM and RAM.
ROM, which is normally flash memory, is used to store the kernel code, along with the code for the runtime library and any middleware components. This code – or parts of it – may be copied to RAM on boot up, as this can offer improved performance. There is also likely to be some read only data. If the kernel is statically configured, this data will include extensive information about kernel objects. However, nowadays, most kernels are dynamically configured.
RAM space will be used for kernel data structures, including some or all of the kernel object information, again depending upon whether the kernel is statically or dynamically configured. There will also be some global variables.
If code is copied from flash to RAM, that space must also be accounted for.
Dependencies. There are a number of factors that affect the memory footprint of an RTOS. The CPU architecture is key. The number of instructions can vary drastically from one processor to another, so looking at size figures for, say, PowerPC give no indication of what the ARM version might be like.
Embedded compilers generally have a large number of optimization settings. These can be used to reduce code size, but that will most likely affect performance. Optimizations affect ROM footprint, but also RAM if the code is copied. Data size can also be affected by optimization, as data structures can be packed or unpacked. Again both ROM and RAM can be affected. Packing data has an adverse effect on performance.
Most RTOS products have a number of optional components. Obviously, the choice of those components will have a very significant effect upon memory footprint.
Most RTOS kernels are scalable, which means that, all being well, only the code to support required functionality is included in the memory image. For some RTOSes, scalability only applies to the kernel. For others, scalability is extended to the rest of the middleware.
Different people have different ideas about what scalability means. Fine grain scalability means that only the core of the RTOS [the scheduler etc.] and the code for the service calls that are actually used are included in the final memory image. There should be no redundant code.
Measuring memory footprint. Although an RTOS vendor may provide or publish memory usage information, you may wish to make measurements yourself in order to ensure that they are representative of the type of application that you are designing.
These measurements are not difficult. Normally the map file, generated by the linker, gives the necessary memory utilization data. Different linkers produce different kinds of map files with varying amounts of information included in a variety of formats. Possibilities extend from a mass of hex numbers through to an interactive HTML document and everything in between.
There are some specialized tools that extract memory usage information from executable files. An example is objdump.
Importance of memory size. The importance of RTOS memory footprint must be understood, as its implications may be non-obvious. As mentioned earlier, memory is always an issue with embedded systems, but the detailed priorities vary from one system to another.
A small system may only have limited on-chip memory and, of course, the application code must be accommodated. Hence, the RTOS must be as small as possible. A bigger system may not have such a pressure on total memory space. System performance is more likely to be the priority. This means that the peak performance is required from the RTOS, so placing it into on-chip memory or locking it into cache may be attractive. Both of these options are most feasible if the kernel size is minimized.
If the system copies code from flash to RAM, it is particularly important to understand the memory space requirements.
Pitfalls. It was mentioned that you might want to make memory usage measurements yourself. Most vendors do publish the information, but this might be particularly hard to interpret and, hence, misleading. It is not suggested that vendors are intentionally misleading their customers; it is simply that there are a lot of variables and assumptions may be made.
For example, the memory size might relate to a minimalist configuration, which is valid, but unrealistic. It may also have been compiled for size, at the expense of much-needed performance. Runtime libraries are often not included in quoted figures.
Ideally vendors would quote a range of values. RAM is likely to be very sensitive to the application – how many kernel objects there are, for example. The ROM size is mainly affected by code, so the kernel configuration is key. As mentioned before, the effect of scalability can be very significant.
Example. To provide a feel for the numbers, here is an example, the Mentor Embedded Nucleus RTOS running on an ARM Cortex A8 in ARM mode, built with the Mentor Sourcery CodeBench toolchain, optimized for size yields:
- ROM size = 12-30 K
- RAM size = 500 bytes.
The low end ROM size includes the essential services; the high value includes all services. The runtime library is excluded.
Building the RTOS for Thumb-2 mode reduces the ROM size by more than a third.
Measuring interrupt latency
The time related performance measurements are probably of most concern to developers using an RTOS.
A key characteristic of a real time system is its timely response to external events. An embedded system is typically notified of an event by means of an interrupt, so the delay between the interrupt occurring and the response to that interrupt – the interrupt latency – is critical.
Unfortunately, there are two definitions, at least, of the term “interrupt latency”:
- System: the total delay between the interrupt signal being asserted and the start of the interrupt service routine execution.
- OS: the time between the CPU interrupt sequence starting and the initiation of the ISR. This is really the operating system overhead, but many people refer to it as the latency. This means that some vendors claim zero interrupt latency
The two definitions are illustrated in this diagram:
Measurement. Interrupt response is the sum of two distinct times:
is the hardware dependent time, which depends on the interrupt controller on the board as well as the type of the interrupt
is the OS induced overhead
Ideally, quoted figures should include the best and worst case scenarios. The worst case is when the kernel disables interrupts.
To measure a time interval, like interrupt latency, with any accuracy, requires a suitable instrument. The best tool to use is an oscilloscope. One approach is to use one pin on a GPIO interface to generate the interrupt. This pin can be monitored on the ‘scope. At the start of the interrupt service routine, another pin, which is also being monitored, is toggled. The interval between the two signals may be easily read from the instrument.
Importance. Many embedded systems are real time and it is those applications, along with fault tolerant systems, where knowledge of interrupt latency is important.
If the requirement is to maximize bandwidth on a particular interface, the latency on that specific interrupt needs to be measured.
To give an idea of numbers, the majority of systems exhibit no problems, even if they are subjected to interrupt latencies of tens of microseconds
Pitfalls. The main problem with interrupt latency is the interpretation of published figures. To understand vendor supplied information you need to know quite a lot about both the hardware and software set-ups.
For hardware, you need to know precisely which platform and interrupt controller is being used for measurement, along with factors like clock speed and cache configuration. The frequency of the timer is also relevant, as its interrupt tick is competing with other interrupts for attention.
On the software side, it is important to know what kind of memory it is running out of and how was the kernel built. Was it optimized for speed? Knowing which interrupt is in use is important, as, on some devices, different interrupts may be handled in different ways.
Lastly, you need to know whether the supplied figure is the best or the average.
Example. To provide a feel for the numbers, here is an example, the Mentor Embedded Nucleus RTOS running on an ARM Cortex A8 at 600MHz yields an average interrupt latency of less than 0.5 microseconds.