A data-centric OS for MCUs using a real-time publisher-subscriber-mechanism: Part 1 - Embedded.com

A data-centric OS for MCUs using a real-time publisher-subscriber-mechanism: Part 1


This article outlines a publisher-subscriber framework formicrocontrollers. It supports the software designer in keeping codemodules more independent of each other (thereby increasing codereusability) and in part replaces code logic by data configuration atrun-time. It simplifies synchronization issues of multi threadedsystems and which I believe is superior to many available real timeoperating systems.

Throughout the article I am going to discuss the proposed system andits features theoretically, provide example code and link it to theeffect it has on overall software design, the software developmentprocess, reusability and code-maintenance.

Using some simple examples I'll introduce you to the basicprinciples of the data-notification system. I will also discuss itsreal-time capabilities, emphasizing pre-emption and synchronizationtopics. Some performance measurements will also be presented. The finalpart presents an example closer to real world projects. Among otherfeatures it shows the effect of the data-centric operating-system(DCOS) on system design and module reusability and illustratestransparent data-flow-reconfiguration.

Example code
All example code has been developed and tested on a MCB2130 evaluationboard (ARM7 LPC2138) kindly provided by KEIL. All projects also run inKEIL's simulator (with the exception of sound-simulation, of course).

You can download KEIL's Evaluation Software for ARM7 MCUs at the Keil Web Site and playwith the sample projects without requiring any real hardware.

The basic ideas of the system described below arose in 2002 when Iwas doing some multi-threaded programming under windows and carryingout some microprocessor programming at the same time.

For performance reasons I had used multi threading in a PC program.Asynchronously running threads communicated with each other, they hadto synchronize access to their shared resources (global vars, etc.) andthreads waited for others to have completed subtasks. Under Windows Iused critical sections and mutexes for synchronization and waitableevents to trigger one thread from the other. Groups of threads weregrouped into 3 basic priorities.

Data-acquisition was running at high priority. When data wasarriving from external hardware I had to pick it up before it got lostor overwritten. Response and control output to hardware was running ata medium priority level while keeping the UI up to date was a lowpriority. Under windows I had to meet the additional requirement thatthe UI could only be accessed safely from the tasks main thread ” i.e.UI access as a whole was not multi thread safe.

I found that I often had a similar type of problem with themicroprocessor projects I worked on. The difference was thatperformance was much more limited. Whereas on the PC there was no wayaround an OS this was an option on the micro. I thought about doingwithout an RTOS and stayed with it.

After starting to work for BST International in 2003 I got greatopportunity to apply this system to a new product of BST ” the CCD Pro,a line sensor camera. Again the job involved the 3-priority-levelmodel: high ” data acquisition, medium ” work and communication, low “UI.

Also, the display library was not multi-thread safe. Some highpriority tasks produced the data that was to be consumed by thecommunication and User-Interface tasks. Tasks with a higher priorityshould always pre-empt tasks with lower priorities. I am still thankfulto my boss for having had the courage to let me get on with it despitethe fact that the fairly common requirements usually would have calledfor an off-the-shelf RTOS.

Soon it became apparent that this approach increased the overallperformance by a factor between 3 and 10 and at the same time made theapplication more flexible, too. As the basic system grew it was reusedfor several other products using an identical code-base for the centralmodules. These modules are very central, they have a clear and separateinterface, they manage “tasks” and support inter-task-communication -sothey do a job similar to that of a standard RTOS.

Events, synchronization, real-timeand software design
When coding for microprocessors I used software interrupts to replacethe waitable events I had used under Windows. This way thetask-scheduler could be avoided altogether. A task scheduler has it'sown time-base which partly determines the event-response-time of a realtime system. Add the time required for the task switch, too.

Compared to that a software interrupt is as much real-time as youcan possibly achieve on a microprocessor system. You set anSW-interrupt and the event response time is now the same as theinterrupt-latency of the SW-interrupt ” provided the interrupt isconfigured to a higher priority than the current task.


To see how the SW-interrupts work look at the example project “swISR.uv2 ” in Listing 1 to the left, whichcontains the relevant code snippets. On the KEIL IDE just pressCtrl-F5, set breakpoints to lines 57, 74 and 88 in swISR.c andpress F5 to let the system run.

The endless 'Main'-loop triggers the medium priority SW-ISR. Thestarts executing, triggers a high priority SW-interrupt. At thispoint MedIsrMedIsr itself is preempted by the processors interruptcontroller. The high priority ISR is executed. This triggers the lowpriority interrupt but completes without interruption. Then executionreturns to MedIsr . When thiscompletes the LowIsr executesand finallycontrol returns to the endless main-loop.

To see how the system works you can single step through the codeeither in the simulator or on the real processor. Alternatively youjust let it run and inspect the array log and the interrupt countvariables at the breakpoints. Inspecting the log and count variables inthe watch window you'll see the order in which the ISRs do their work.

Note that there is a re-entrance counter built into log (which neverincrements). The fact that one interrupt can never be interrupted bythe same interrupt level can be used to avoid those types ofsynchronization issues that are meant to deal with re-entrance.

The event-response-time is the number of clocks from e.g. triggeringthe high priority interrupt in line 74 to its entry in line 57. Thesimulator ignores pipelining and measures 0.67µs. With anoscilloscope I measured 1.45 µs (including diagnostic output toLEDs).

(Note: If you single-step throughthe code on the real hardware the debugger interferes with theinterrupt controller and the execution order differs from letting itrun from a breakpoint in line 74 to the one in line 57 .)

Now look at SW-interrupts from a slightly different perspective andforget that once you may have learned that “an ISR has react quicklyand finish quickly”. Instead look at an interrupt level as a taskpriority, and regard a SW-ISR as a sort of task. Next, let theinterrupt controller do the (usually time consuming) context switchesfor you. You can do this on any processor that has a few interruptlevels, SW-interrupts and supports nesting of interrupts.

The data-notification framework is based on these softwareinterrupts and, of course, the framework adds some overhead. The pointI am trying to make by this excursion into software interrupts is thatthe data-notification-framework (and the callbacks you register withit) truly react to changes of data, whereas messaging and eventsprovided by many RTOSes often depend on the time-slices of thetask-scheduler.

Very often I have seen messages or global variables being used toestablish a flow of data from one task to another one. In thoseimplementations the response times depend on how well the two threadsare synchronized. Any minor code change also changes the executiontimes of the individual threads and jeopardizes that carefully adjustedsynchronization of tasks.

The DCOS concept does not offer anything to exclude fractions of codefrom interrupting each other so far. And I know something really needsto be done about it. For the time being the only way to deal with it isa global interrupt disable, which is not really acceptable. But theproblem is not as serious as it seems at first glance.

On some processors you can manipulate the priority level at whichthe current code is executing. On these you can raise the prioritylevel to something higher than all interrupt levels requiringsynchronization. You can let a few extremely important and accuratetasks run above that raised level and yet synchronize the others. On anARM you can't do this and you'd have to selectively disable interruptsinstead. In the absence of mutexes & co any priority task may haveto use LOCKs and may thereby starve any other task.

There are two other points to note about synchronization with theDCOS:

1. The data buffer passedinto a data-change callback will always be onsistent. By carefullygrouping consistent data into structures and creating data objects ofthese structures you can ensure that consistent data will remain sowithout needing mutexes or critical sections.

2. This point applies to theuse of non re-entrant libraries. Display and other output librariesvery often are not reentrant. Imagine your program is sending a stringover the serial port. After having written half of the string somehigher priority thread comes along and writes something in between. Thereal output would be messed up.

The very easy solution to this problem is to let all outputfunctions run on a single interrupt priority (which is by nature notre-entrant). We can easily realize this by placing all output ordisplay functionality in data-change-callbacks with a low priority.

Changing data is an event
Now I am beginning to get into the central idea of the DCOS. Somemodules provide data, others consume it. The provider creates a dataobject in the DCOS. The consumers tell the framework to notify themwhen the data gets changed. This mechanism is implemented in theframework. The notifications are done using callbacks.

These are called from any one of three priority levels, which are infact interrupt priorities of software interrupts. As this system relieson callback functions I'll give you a quick intro-duction,just in caseyou are not familiar with callbacks. If you are, you may want to skipthis paragraph.

What is a callback?
The most common callback function I know is the timer callback. Thesample code for this project is to be found in the directory “timerCallback “.Its complete source code is shown in Listing2, to the right below . The timer module itself is provided in alibrary.


The callback function in this example is the function On-Timer(). Itis called a callback as it's not called directly by your own code.Instead it's called back by a framework when the framework thinks it'stime to do so.

To tell the framework about the existence of the callback functionyou have to register the callback function, which is done by passingthe functions address in a call to the frameworks registration function(CreateTimer in this example). You might like to think of your callback as a task ora thread. (Conceptually the callback is your code's response to anevent.)

Data-changed events
A hardware event is handled by an ISR and the interrupt vector tablereally is a list of pointers to callback functions (OK, special oneswith extra HW support). Analogously, changing data is a software eventand appropriately handled by a data-changed callback-function.

The following simple code fragments in Listing 3, below left , show thecreation ofa notifiable data-object, the registration of a callback for adata-has-been-changed-event. It also shows how to modify or set thedata-object's value and how to get at the data from within in thenotification callback.

In the example code fragments in Listing3 a data-object is created that is to bereferenced by obIdTestVar .A notification callback for it is registered and the data-object isbeing manipulated in response to a 35ms timer tick. The On-Timer() itselfgets interrupted by the interrupt controller which enters the SW-ISR inthe DCOS framework and calls back into OnTestVarChanged() .BraunCode3

The  DemoDataNotification sample project doesthe same with a few more data-objects at different priorities. It alsointroduces the 1-to-many relation. Two or more data consumers subscribeto one published data entity.

The project was created to test the performance and the pre-emptivecharacteristics. It uses a timer-callback and adds severaldata-notification call-backs of various notification priorities. Theprojects main function performs a little initialization, registers atimer callback, creates some data objects, registers notificationcallbacks with the DCOS and returns. (It could just as well run anendless idle loop that keeps putting the processor into power savemode.)

From here on all the real work is done in the timer callback and thevarious data-changed-notification-callbacks. These are called bysw-interrupt service routines from within the framework whichpre-empt each other according to their predefined interrupt priorities.If you are wondering about 'there is no main-loop', don't worry. If Ihadn't seen it work very well for the last two and a half years Iwouldn't believe it myself.

In order to be able to take measurements each of the callbacks turnson a LED at entry and turns it off on exit. I have connected a digitalanalyser to the LEDs to track the execution times of each function andto control the correctness of the pre-emption priorities. The relevantinterrupt priorities are (from low to high):

1. Data low prioritynotification
2. Timer callbacks
3. Data medium prioritynotification
4. Data high prioritynotification.


This results in the slightly complicated chain of events as follows(illustrated in Listing 4 to the left ,and the digital analyzer recording in Figure1, below :

1) The function OnTimer(LED 3) is called every 35ms. It increments a count variable andmodifies a data object referenced by obIdTest1 .

2) OnTest1HighChanged(LED 4) is a high priority call back registered with obIdTest1 . Thisfunction pre-empts the timer callback immediately. It obtains the valuefor obIdTest1 and writes it to obIdTest4 and obIdTest2.

3) OnTest2MedChanged(LED 6) runs next and retrieves the data value.

4) OnTimer carries on, modifies obIdTest3 andcompletes.

5) OnTest4LowChanged(LED 8) runs and retrieves the data value.

6) OnTest1LowChanged(LED 5) runs and retrieves the data value.

7) OnTest3LowChanged(LED 7 ) runs, modifies obIdTest2 again and gets pre-emptedimmediately.

8) OnTest2MedChanged(LED 6) runs again and retrieves the data value.

9) OnTest3LowChanged completes.

The measurements show about 12µs from setting LED 3 in thetimer callback to LED 4 in the high priority data notificationcallback. This is a measure of the frameworks reaction latency for highpriority data-notification.

Apart from the pure latency the measured time includes setting theLEDs and incrementing a count variable. This value compares well withthe context switch times of fast RTOSes even though the DCOS does a bitmore than just switching context. It outperforms the mailbox send /receive implementations of most commercially available RTOSes.

Because only the data-consumer has to know the data-producer (theyhave to know each other with messaging) the DCOS also creates fewerdependencies between modules. This will be discussed in more detail inthe next section.

Software design, moduleindependence and reusability
In my experience there are several drawbacks of standardmicrocontroller operating systems that use tasks and messages. Byoffering tasks they make programmers think in a certain way that doesnot yield the best possible reaction times. This means the softwaredesign is subconsciously made to fit into a concept of tasks andmessages and is thereby limited. This is thinking in terms of what canbe done. As a result the system performs less optimal than it could.

Let's think about an example application and the various software design approaches thatcould be applied: (1) A/Dconversion, (2) trigger-and-react-in-ISR, (3) round-robin-tasking, (4) mailbox or event driven, and, finally, (5) the data centric OS (DCOS) publisher-subscriber approach whichovercomes the various limitations of the others.

An AD conversion is to be done cyclically, some calculations are tobe carried out on the result which is then to be sent to outputhardware. Also the input and output are to be sent to a display usingslow-display functions.

Depending on the position of a simple switch the calculations are tobe done using formula A (fast & simple) or B (slow & complex).The conversion and output shall run at a high frequency. The displayneed be updated only once per second. The round-robin-tasking approachgoes as follows:

1. A cyclic task is executedevery ms and increments a millisecond-counter variable.
2 . When counter%2 == 0 ittriggers an AD conversion.
3 . When counter%2 == 1 it picksup the result from the AD-converter, reads the switch position andperforms calculation A or B. The result is sent to the output hardware.
4. When counter == 1000 thedisplay function is called. The counter is reset to zero.

Because there are no priorities in a round-robin-scheduler the slowdisplay function (which requires more than a millisecond to run) causesa bad reaction time once a second. Also millisecond counter incrementsget lost as every thousandth cycle takes longer than one millisecond.

For this approach to work accurately the Task-Cycle-Time must be:

1) longer than the longestexecution time of the cyclic function (in this case the displayfunction). Otherwise there is a risk of losing count-ticks and therebyaccuracy.

2) short enough to meet thereal-time objective. The task runs several times before a reaction toan input is complete. In the above example we need 2 runs from startinga measurement to completing the reaction on it. The more sub states wehave, the finer grain timer we need to meet the overall reactiontargets. At some point the RTOS-overhead of calling the Task-Functionwill begin to exceed the real work carried out inside the function.

The fact that these conditions are controversial is the lesserproblem. The impact on project maintenance is worse: Think about thepossible side effects of changing the code in one of the steps. Theexecution time might increase a bit and you might loose another tick.With every change of code you would again have to measure executiontimings.

The module interdependence of this system is low. A central module” the round-robin-scheduler ” may know all the other modules (ADC,Calculations, Output). They can remain independent of each other may bereused in other projects and can be tested separately in projectsdesigned to stress-test the individual code modules.

Trigger-and-react-in-ISR approach
The next approach to look at is the trigger-and-react-in-ISR,in which:

1. An initial AD-conversionis triggered at system start.

2. The ADC interrupt serviceroutine picks up the value from the AD-converter when the conversion iscomplete. It then performs calculations A or B according to theposition of the switch. (Which is read once per interrupt). The resultis sent to the output hardware straight away. To transport informationto the lower priority output task the AD-value and the output-value arecopied to global variables. For synchronization this copying is guardedby temporarily disabling interrupts. A software interrupt is set tosignal that data is available for the display. The next AD-conversionis triggered.

3. The SW-ISR again uses adisable interrupt to guard access to the global AD-value and theoutput-value while is copies them into local variables. It then callsthe slow display-out-put function. It gets interrupted many times bythe ADC-ISR and gets execution time only when the AD-conversion isexecuting, i.e. from Start-Of-Conversion to the End-Of-ConversionADC-interrupt.

The timing of this approach is extremely good. As many conversionsas possible are carried out. Predictable reaction times can be statedfor calculation A and calculation B. But the interdependency is high.The SW basically consists of one monolithic module:ADC-calculator-HW-outputter all in one. None of the code can be reusedin other projects.

The interdependency could be lowered using callbacks. The ADCs ISRwould call a callback with a new conversion result. This would be afunction of the calculation module registered with the ADC moduleduring system initialization. This function could then do thecalculation and finally call a function of the output-module.

In this way all the code would still run within the ADC-ISR but themodules would be more independent and reusable. The dependencies(calculation module uses AD-module and output-module) would mirror thereal usage-relations. The callback functions themselves, however, neverknow which interrupt level they will be running at.

Mailbox or event based approach
Now, let's look at the mailbox or event based approach, in which:

1. An initial AD conversionis triggered.

2. The ADC interrupt serviceroutine picks up the value from the AD-converter and sends it into amailbox when conversion is complete.

3. The calculation moduleruns in its own task and waits for messages in the ADC-mail-box. When anew message (i.e. a new AD-value) has arrived this task does itscalculations depending on the position of the switch and calls afunction in the output module. It also posts AD-value and output valueto another mailbox. Then the next AD-conversion is triggered.

4. A display task runs at alower priority and waits for messages in the second mailbox (containingAD and output values) and calls the slow display function.

This approach performs similarly, too (as many conversions aspossible, predictable reaction). The only difference is the muchgreater overhead. For such high speed conversions and relatively littleamount of real work, the RTOS overhead may well exceed the processortime spent with your own code. Another disadvantage is the secondmailbox which is required only to actively pass the data on to thedisplay task.

If this approach is extended to more complex systems withasynchronous tasks, a lot of RTOS-tasks waiting for messages and asmany dedicated mailboxes will be required. Tasks, mailboxes and eventsare a limited resource in almost any RTOS.

Alternatively the data could be passed on to the lower prioritydisplay task by means of global variables. Then you would have tosynchronize access to these global variables which would introduce ajitter into the execution of the calculations task. Anotherinter-module-dependency between display-task and calculation would becreated, too. So both mailboxes and global variable synchronizationcreate a dependence between task modules.

Next in Part 2: Using adata-centric OS publish-subscribe framework

Dirk Braun (dirk.braun@bitel.net)studied mechanical engineering at King's College London. He has beenworking as a self-employed programmer for industrial applications,databases, PCs and microcontrollers and held various courses onprogramming in C, C++ and Java. Since 2003 he has been as a seniorsoftware developer at BST International whichproduces web guiding and web control systems for the paper, labelling,packaging, printing and tire industries.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.