Low-cost cooperative multitasking, Part 3 – Scheduling and throughput balancing - Embedded.com

Low-cost cooperative multitasking, Part 3 – Scheduling and throughput balancing

In the previous Part 1 and Part 2 in this series, we saw how to break down our microcontroller-based embedded system into smaller components and then make these components work with each other to achieve system functionality.  Task designing and intertask communication with respect to state machine-based multitasking systems were discussed.  

In this part of the article series, we will put all of the subsystems together and discuss a scheduling method to ensure robust task switching and CPU throughput balancing.

A Simple Scheme

The simplest form of scheduling tasks is the infinite loop.  As discussed in the first part of this article series, all the tasks are called in the correct order of execution. Once we are at the end of the list, we move back to the top.

Of course, this system has no rigid periodicity for task execution and may not be suited for complex systems like MP3 players.

Even in this system, a sense of timing can be introduced by carefully measuring the tasks’ execution time and adding an idle task to round off to the next millisecond.  For example, suppose Task1 to Task 4 takes a combined 4.1 ms.  In this case, you may write a delay routine in the IdleTask() to run for 0.9 ms, to round off the loop to 5 ms. This scheme works for simple systems in which each task takes a fixed amount of time, or if the timing constraints are not very rigid.  There are numerous systems out there that do not need more complex scheduling than this.

Tasks that need to run leisurely can have an internal counter in them and skip some calls, thus managing throughput.

Here, task X is set up to execute only on even-numbered calls.  Similarly, another task may be set up to run on odd-numbered calls. We can get a bit fancier by staggering more than two tasks, to achieve a desired balance of CPU throughput.

A Deterministic Scheduler

More complex systems like our MP3 player require executing tasks at exact timer ticks and with exact periodicity. This is easily accomplished by adding a timer interrupt to our scheduler.  A timer is configured to provide an interrupt at a known interval.  

A global variable “TimerTick” holds the current tick.  The rule of thumb for the timer tick interval is 5 ms or 7.8125 ms (7.8125 X 27 = 1000ms = 1 second, useful in some cases).  This tick variable is incremented asynchronously by the interrupt.

The scheduler’s infinite loop calls all the relevant tasks for the current timer tick and then waits until the next tick arrives.  The most important criteria for Task design, here, is to “finish within the tick.” 

All tasks should be designed and scheduled in such a way that, on each tick, all the scheduled tasks execute and return control back to the scheduler before the next tick arrives.  Tasks are scheduled under various multiples of this “tick interval” and are always staggered, so that the CPU is evenly loaded.  Larger tasks that take a long time to execute should be split and executed in a staggered fashion.

To implement this scheme, a data structure is created with all the tasks, with their task pointer (a function pointer in C) associated with periodicity and offset numbers. The periodicity number defines at what multiple of the timer tick the task should be executed, and the offset number defines how many ticks to stagger the execution. 

On a 5 ms tick system, an entry of {USBTask, 2, 0} would mean USBTask should run every 10 ms, without staggering─that is, at 0 ms, 10 ms, 20 ms, etc.  An entry of {TouchSenseTask, 2, 1} means a periodicity of 10 ms but staggered by 1 tick, thus TouchSenseTask will execute at 5 ms, 15 ms, 25 ms and so on. The entry { SystemMonitor, 4, 3} would schedule SystemMonitor at 15 ms, 35 ms (20+15), 55 ms, 75 ms etc. 

The task ServiceUserInput can be scheduled very leisurely as human interaction is fairly slow with respect to the speed at which the microcontroller works. ServiceUserInput shall be scheduled every 50 ms and its multiples say {ServiceUserInput, 10, 1}, e.g. 5 ms, 55 ms, 105 ms.  We see that it is quite easy to even out the load on the CPU once the tasks are properly defined.  The task table is placed in Flash memory, which saves precious RAM.

Here is the pseudo code for the scheduler that we just designed.

Figure 1 below shows the execution timeline of the above task table on a 5 ms timer tick.  Let us assume, for simplicity and visualization purposes, that the DisplayDriver takes 0.5 ms per invocation, the USBTask takes 1.5 ms, and the TouchSenseTask and ServiceUserInput tasks each take 1 ms.  

These numbers are, of course, not typical design numbers. The blank space on the time line represents the time for which the CPU is Idle in each timer tick. Figure 1 below shows that we still have a sizeable amount of throughput left for the integration of the remaining subsystems of our USB MP3 player.  The filesystem and displaymanager tasks should now be scheduled into the idle time we see on this timing graph.


Figure 1:  CPU Timeline of a USB MP3 Player After Partial Integration of Tasks

Let us review assigning priorities to tasks.  Suppose two tasks need to run one after the other.  We can place them one below the other in the Task-Description table.  If this violates the “finish within the tick” criteria, we should place them with the same period, but stagger the lower-priority task by one tick.

Throughput Management

Throughput in the context of a microcontroller can be defined as the amount of instructions the CPU can execute in a second.  The throughput of a microcontroller is a limited resource and often linked to its cost.  So far in our multitasking system, we have managed to build the software with very little CPU overhead. This was done by intelligent task design and smart scheduling.  

In our system, it is easy to plug in new subsystems or software stacks, as well as replace existing software with upgrades.  However, these changes also bring challenges that are common to even RTOS systems.  Changes in software change the execution time and we may end up violating the “finish within the tick” criteria.  It is therefore beneficial to add some diagnostic methods to help you debug your system.

A handy tool is to log the timer before and after each scheduler loop so that the exact tick on which we overstepped the tick interval can be captured.  This time difference is stored in an array of size equal to the largest periodicity in our task list.  If the largest periodicity is 30 ticks, then you would need 30 elements in the logging array.  When the TimerTick reaches this count, the Log array is populated from the beginning.  Here is the pseudo code for the altered scheduler.

The #ifdef DEBUG is a C-language directive to include code within it only when the symbol DEBUG is used during debugging.  Once you catch and fix the task that overshoots, you can disable the logging.  In this way, RAM and CPU usage are not affected.  Even when your system is within limits, this logging method can be employed to figure out the remaining CPU throughput and plan your upgrades.

Note that it is sometimes fine to let the timer tick arrive before all the tasks are completed.  That is, it is okay to over shoot the interval between ticks if the next tick is known to be quite load free and no time-critical task is run at that tick. The scheduler will simply catch up with the ticks, since it does not wait for a tick.

Some Tips and Tricks

The robustness, deterministic nature and low-cost features of our system are achieved by careful task design and smart scheduling.  However, it is easy to stumble during the first implementation of this multitasking system, as a lot depends upon careful planning before coding your software.  Here are some tips to keep you on track:

1. Document everything.   It is easy to get lost as the number of state machines increases, especially with the increase in their interactions.  Spend some time to create a design document and list out all tasks and their dependencies upon each other.  Document the code in detail.  All shared variables should be documented with their purpose and scope. Following a good coding standard is always beneficial for increasing readability and it helps when you revisit your code for an upgrade or bug fix.

2. Implement some form of diagnostics. This could be built into the product or a debug time-only feature. When you change a subsystem or add new features, the timing of the system inevitably changes. You may need to rebalance the tasks.  It is handy to have some timing diagnostics implemented.

3. When you feel lost, start with two tasks and build back your system . It is quite possible that all hell breaks loose when you first integrate all of the tasks into your system.  Be patient and start integrating, one task at a time. In this way, you can find the culprit task that runs away with the CPU and fix integration issues.

4. See if you can put the microcontroller to a low-power state like Sleep, Doze or Idle mode when you enter the WaitForTimerTick() routine and let the timer interrupt wake you up.  If the microcontroller is idle for a sizeable amount of time, this will reduce the power consumption of the system.  The sleep currents are quite low in some of the low power microcontrollers, which can enable the system to run longer on batteries.

5. Use natural numbers in the state machine switch cases .  Compiler optimize the switch cases to “decrement and check for zero and jump” CPU instructions if the case numbers are in natural order. This requires significantly fewer instructions compared to jumping otherwise, thus saving some throughput.


The low-cost cooperative multitasking method using state machines presented in this 3-part article series is effective for systems that are high in features and frugal in resources. 

While meticulous planning and design are required for success, once the art is mastered, it is easily extended to new projects and the expansion of existing projects.

This method is suitable where the appearance of multitasking is required on low-cost 8- and 16-bit systems with a constrained memory footprint.  

While this method can be made to work for more complex systems as well, the engineering effort of implementing the state machines might prove significantly larger than using an off-the-shelf RTOS.  A decision should be taken early in the project’s life cycle.  Using state machines is the way to go if you are trying to reduce costs.

To read Part 1 , go to “Building a simple FM player.”

To read Part 2 , go to “Building an MP3 player.”

As Senior Applications Engineer for Microchip Technology’ s AdvancedMicrocontroller Architecture Division, Ganesh Krishna is groupleader for Microchip’s graphics product portfolio, including display drivers,graphics and display libraries, manages peripheral libraries for PIC18 andPIC24F microcontrollers, and develops reference designs and performsbenchmarking.Ganesh earned his Bachelor of Engineering Degree inElectronics and communication from Sri Jayachamarajendra College of Engineering,belonging to Vishveshwaraiah Technological University (VTU).

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.