Using requirements planning in embedded systems design: Part 6 – Determining your system’s task timing parameters - Embedded.com

Using requirements planning in embedded systems design: Part 6 – Determining your system’s task timing parameters

One of the key points in the design methodology described in this series of articles is that it must generate real-time programming. So, it follows that the analysis of the system’s timing requirements should be part of the system’s design.

In this last part in a six part series, we will examine the timing requirements of each of the software functions in the various tasks, and from this information, determine a timing system that will meet the system’s needs.

The first step is to list all the timing specifications from the requirements document. Note, if the functions grouped into a task have different requirements, then the specifications for each function should be included separately.

Now is also a good time to review the reasons for combining the function to verify that they should really be in a common task. In the example shown following in Figure 3.14, the timing requirements for our alarm clock design example are listed.

Entries for both the event-to-event timing and response timing are included in the time domain. If the timing requirement is listed in the form of a frequency, it should be converted to the time domain at this time for easier comparison with the other timing requirements.

Figure 3.14. Timing requirements listed in the form of frequency
Task1 Display
360Hz +20/-0 2.635 - 2.777mS
Alarm flash 0-50mS following time update (1Hz)
50% duty cycle +/-10%
Blank 909.9mS to 1111.1mS +/-0 overall
Sync to Time update
Response to Blank 8mS maximum

Task2 TimeBase
1sec +/-0 overall relative to internal or 60Hz
timebase switchover must occur within 8mS of
presence or absence of 5th pulse

Task3 Buttons
Button bounce is 100mS
Debounce is 50mS
Response to decoded command 34mS maximum
Auto Repeat 909.9mS to 1111.1mS +/-0 overall
Sync to time update 0-50mS following time update

Task4 AlarmControl
Alarm response to time increment, 100mS
maximum including tone startup
Snooze response time 50mS including tone shutoff

Task5 AlarmTone
Alarm Tone .454mS min, .5mS typ, .555mS max
Modulation 454mS min, 500mS typ, 555mS max
event to event
492mS min, 500mS typ, 510mS max
overall

Task6 Error Task
no timing specified.

From this information an overall timing chart for the individual tasks of the system can be compiled. This should list all optimum, minimum, and maximum timing values for both event-to-event and response timing requirements. Any notes concerning exceptions to the timing requirement should also be included.

Table 3.4

All the information needed to determine the system tick is now present. The system tick is the maximum common time increment, which fits the timing requirements of all the tasks in the system. The tick chosen must be the largest increment of time that will be divided into all of the timing requirements an integer number of times.

While this sounds simple, it seldom is in practice. Timing requirements are seldom integer multiples of each other, so the only solution is to choose a tick that fits most of the requirements, and fits within the tolerance of all the rest.

When a suitable tick is found, it should be noted in large letters at the bottom of the chart. This number is the heartbeat of the system and will be at the very center of all timing decisions from this point on.

The best tick for our alarm clock is 250 microseconds. Sometimes even the tolerances on the timing specifications will not allow a reasonable size tick that will fit every requirement. When this happens, the designer is left with a limited number of options:

Option #1: The designer can review the timing requirements for the system, looking for values that can be changed without changing the operation of the system. Timing requirements for display scanning, keyboard scanning, tone generation, and others maybe a matter of esthetics rather than an externally imposed requirement. The only real requirement may only be that they have consistent timing.

If the timing for one of these functions is the hard to fit value, experiment with the timing requirements for these functions. Often this will suggest other tick increments that may fit within the requirements of all the functions.

For example, the timing for the scanning routine in our example is 2.635 ms to 2.777 ms. However, if it were reduced to 2.5 ms for the minimum, then the system Tick could be increased from 250_S to 500_S. This still scans the displays at a greater than 60-Hz rate, so no flicker would be introduced.

Option #2: The second option is to consider moving some of the more difficult tasks to accommodate tasks to a timer-based interrupt. The interrupt can be configured to operate at a faster rate that accommodates the difficult tasks, and frees up the balance of the tasks to operate at a different rate.

(Note : if a task is moved to an interrupt, communications to and from the task will require either a semaphore or buffer protocol. This is because the task will be completely asynchronous to the other tasks, much as the tasks in a preemptive operating system. So, additional handshaking is required to prevent the transmission of partially updated communications variables, in the event that the timer interrupt falls in the middle of a task’s update. )

Option #3: Consider using a tick that is smaller than the smallest task timing increment. Sometimes, using a tick that is 1/2 or 1/3 of the smallest task timing increment will create integer multiples for hard to accommodate tasks.

(Note: This option will decrease the time available in each pass of the system and increase the scheduling job for the priority handler, so it is not generally recommended. If fact, the original tick value of 250_S was obtained using this option. However, shifting the display timing would eliminate the need for a smaller tick, so it was chosen instead.)

At this point in this article and in this series, there should also be a quick mention of the system clock. Once the system tick has been determined, a hardware mechanism within the microcontroller will be needed to measure it accurately. Typically, this job falls to one of the system’s hardware timers.

The timers in small microcontrollers usually have the option to either run from a dedicated crystal oscillator or from the main microcontroller oscillator. If a dedicated oscillator is available, then the oscillator frequency must be set at a 256 multiple of the desired system tick frequency.

In our example, that would be 512 kHz, or 256 times 1/.5 ms. If the system clock is employed, a pre- or postscaler will be needed to allow the system clock to operate in the megahertz range.

Assuming a prescaler based on powers of two, that means a 1.024 MHz, 2.048 MHz, 4.096 MHz, 8.192 MHz, or 16.384MHz oscillator. If none of these options are available, then an interrupt routine can be built around the timer, for the purposes of preloading the timer with a countdown value.

This value is chosen so that the timer will overflow at the same rate as the desired tick. Note that an interrupt routine is needed for this job because there will very probably be task combinations that will periodically overrun the system tick. An interrupt routine is the only way to guarantee a consistent time delay between the roll-over and the preload of the timer.

For our example, we will use a 4.096-MHz main system clock and a divide-by-8 prescaler to generate the appropriate timer roll-over rate for our system tick, and avoid the interrupt option.

Once a suitable timing tick is chosen, the skip rates for all of the system tasks can be calculated. This value will be used by software timers which will hold off execution of the state machine associated with the task, for X number of cycles.

This slows the execution of the state machine, so its operation is within its desired timing. Using the timing information from our alarm clock design, and assuming the modified Task1 scan timing, Table 3-5 below is constructed.

Note the values in parentheses following the skip rates. These are the skip rates for the maximum times. Assuming that the optimum time is not the maximum, then these values constitute the amount of leeway that is still available in the task’s timing. We noted this information for its potential use later in the design, when we define the priority handlers.

Table 3.5

Up to this point in the design, we have assumed that the system would use a rigid timing system that regulates the timing of the software loop holding the task state machines. However, there is another option for systems that are not required to comply with specific timing requirements. The option is to run the system without a timing control.

By far, the first option using a rigid timing control is the most common. However, in rare instances, when the timing tolerances are very broad or nonexistent, the second option can be implemented.

Now as a designer, you may ask, “What is the advantage to a completely unregulated system and what possible design could possibly operate without some regulation?”

The truth is, no system can operate completely without timing regulation, but some systems can operate by only regulating the functions that actually require specific timing. The other tasks in the system are run at the maximum speed of the main system loop.

For example, consider a simple user interface terminal with a display and keyboard. Button presses on the keyboard result in ASCII data being sent to the host system, and data received from the host is scanned onto the display. The only functions in the system that require specific timing are the serial transmit and receive functions interfacing with the host system.

The display and keyboard scanning rates only have to comply with a reasonable minimum scanning rate. In this example, the serial input and output tasks are typically regulated by the baud rate of the serial interface.

The control, display scanning, and keyboard scanning tasks could then be run at the fastest rate possible given the microcontroller clock frequency. The rate at which these three tasks operate would be variable, based on the execution time of each task on each pass through the system loop.

However, as long as the minimum scanning rates are achieved, the system should operate properly. The advantage to this type of system is that it operates more efficiently and more quickly than regulated systems.

There is no dead time at the end of each cycle as the system waits for the next tick; the system just jumps back to the top of the loop and starts into the next task. This saves program memory, complexity, and it means that every available system instruction cycle is used to perform a system function.

As a result, the system is very efficient, and will outperform a more rigidly regulated system. The only downside is that the tasks within the loop cannot use the loop timing to regulate their operation. Instead, they must rely on hardware-based timer systems for accurate timing.

The major downside to this system is that it requires a hardware timer for every software-timed function, and only works well for systems with few, if any, routines with strict timing requirements.

So far in this series,we have gathered together the various priority requirements and used them to define the system’s modes. This covers the majority of the work at this level of the design.

The only additional work is to update the table with the information from the task definition performed earlier in the series. Basically, we need to rewrite the priority list and the criteria for mode change list using the task names. We also need to note any functions that should be disabled by a specific system mode.

So, to review the information from the requirements document dissection, we have defined the following list of system modes:

• Timekeeping mode: Current time display, alarm is disabled, no commands are in progress, and normal power.

• Time set mode: Current time display, alarm is disabled, normal power, and time set commands are in operation.

• Alarm pending mode: Current time display, alarm is enabled, normal power, no commands in progress, and alarm is not active.

• Alarm set mode: Alarm time display, normal power, alarm set commands are in operation, and alarm is not active.

• Alarm active mode: Flashing display of current time, alarmed is enabled, alarm is active, no commands in progress, and normal power.

• Snooze mode: Current display time, alarm is enabled, snooze is in progress, and normal power.

• Power fail mode: Display is blank, internal time base in operation, alarm is inhibited, and battery supplied power.

Replacing the individual functions with the tasks that now incorporate the functions, we have the following priority list shown in Table 3.6 below:


Table 3-6

There are several interesting things to note about the new priority list. Many of the newly defined tasks include both low- and high-priority functions. This means that some tasks can be classified as either low, mid, or high priority.

When compiling the table, always list the task only once, and at its highest priority. When we get to the implementation of the priority handler, we can adjust the task priority based on the value in the state variable, if needed.

Also, note that some of the functions do not change in priority. For example, the display task is always a medium priority. Other tasks do shift in priority based on the system mode; they may appear and disappear, like the alarm tone and alarm control tasks, or they may just move up or down as the button and time base tasks do.

Once the priority list has been updated to reflect the task definition information, we also have to perform a quick sanity check on the criteria for changing the system modes. To be able to change mode, it make sense that the task charged with providing the information that triggers the change must be active before the change can occur.

What we want to do at this point is review each criterion, checking that the task providing the trigger for the change is in fact active in the original mode. If not, then the priority list needs to be updated to include the task, typically at a mid or low level of priority. For example, using our alarm clock design example as shown in Table 3.7 below :

Table 3.7

In each of the original modes, the task responsible for providing the trigger, whether it is a button press or missing time base pulses, must be active at some priority level to provide the necessary triggering event.

If the task is not active, then the system will hang in the mode with no means to exit. Note that there may be instances in which the response time requirement for a system mode change requires a higher priority for the task providing the mode change trigger.

If so, then both system priority and timing requirements may have to shift in order to accommodate a faster response. Make sure to note the reason for the change in priority and timing in the design notes and adjust the priority list accordingly.

Once all the priority information has been cataloged and the necessary task trigger event information verified, copy both the priority list and the list of criteria for making a system mode change into the design notes for the system.

Include any information relating the changes made to the design and list any options that were discarded and why they were discarded. Be clear and be verbose; any question you can answer in the text will save you time explaining the choices later when the support group takes over the design.

So far in our design of the system, we have touched on a few error detection and recovery systems. These include error and default states for the task state machines, a system error task to handle errors that affect more than one task, and a definition of the severity of several system-level failures.

In fact, one of the primary software functions in the design of the alarm clock is the automatic switch over to an internal time base if the 60-Hz time base stops; this is also an example of an error detection and recovery system.

What we have to do now is define how these faults will be handled and what tasks will be affected by the recovery systems. In our dissection of the requirements documents, we define soft, recoverable, and hard errors for the system:

Soft Fault
Fault: Button pressed is not valid for current mode or command.
Press of SLOWSET without FASTSET, ALARMSET, or
TIMESET held.
Press of SNOOZE when not in alarm active mode.
Press of any key in power fail mode.
Test: Comparison of decoded button command with legal commands, by mode.
Response: Ignore button press.
Fault: Button combination is invalid.
Press of SNOOZE with FASTSET, SLOWSET, ALARMSET, TIMESET.
Test: Checked against acceptable combinations in command function.
Response: Ignore button press.

Recoverable Fault
Fault: Alarm time is out of range (Alarm time >23_59).
Test: Alarm control function test of value before current time comparison.
Response: If alarm is enabled, sound alarm until ALARMSET button press. If
in any other mode, ignore (fault will be identified when alarm is enabled).

Recoverable Fault
Fault: Power failure.
Test: 5th missing 60-Hz time base pulse.
Response: Goto power fail mode until 5th detected 60-Hz pulse.

Hard Fault
Fault: Watchdog timer timeout, brownout reset.
Test: Hardware supervisor circuits.
Response: System is reset. If BOR, then system held in reset until power is restored.
System will power up in error mode.

We now need to add any new faults that have come to light during the course of the design. These include error conditions within the state machines, or any communications errors between the tasks. We also need to decide on recovery mechanisms, the scope of their control, and whether the recovery system resides in the state machine, or the error task state machine.

Let’s start with a few examples. Consider a state variable range fault in the display task state machine. The detection mechanism is a simple range check on the state variables, and the recovery mechanism is to reset the state variable.

Because the display task is a control end point, meaning it only accepts control and does not direct action in another task, the scope of control for the recovery mechanism is limited to the task state machine.

As a result, it makes sense that the recovery mechanism can be included within the state machine and will not require coordination with recovery mechanisms in other tasks.

A fault in the time base task, however, could have ramifications that extend beyond the task state machine. For example, if the state machine performs a routine check on the current time and determines that the value is out of range, then the recovery mechanism will have to coordinate with other tasks to recover from the fault.

If the alarm control task is active, it may need to suspend any currently active alarm condition until after the current time value is reset by the user. The display task will have to display the fact that the current time value is invalid and the user needs to reset the current time.

The time base task will have to reset the current time to a default value. And, the system mode will have to change to Error until the user sets a new current time value.

All of this activity will require coordination by a central entity in the system, typically a separate error task acting as a watchdog. In fact, the specific value present in the error task state variable can be used as an indicator as to the presence and type of error currently being handled by the system.

To document all this information, we will use the same format as before, classifying the fault as to severity, soft, recoverable, or hard. Name the fault with a label descriptive of the problem and the task generating the fault condition. List the method or methods for detecting the fault, and detail the recovery mechanism used by the system.

Remember that each task will have a state machine, and each state machine will have at least one potential error condition, specifically the corruption of its state variable. In addition, there will likely be other potential error conditions, both in the operation of the task and its communications with external and internal data pathways.

Another potential source of errors is from the communications system. Semaphore protocol pathways have the potential to create potential state lock conditions. If the problem cannot be averted by changing one or more of the pathway protocols, then the state lock condition will be an error condition that must be detected and recovered from by the system.

Buffers also have the potential to create error conditions, should they fill their buffer space. While these errors are typically considered soft errors because they don’t require user intervention, the error-handling system may need to be aware of the problem.

Once all the potential system errors have been identified, the severity of the error condition must be determined, a test developed to detect the condition, and a recovery mechanism devised to handle the problem.

This can be particularly problematic for communications errors, specifically potential state lock conditions. This is because both communications in a state lock condition are legitimate data transfers.

However, due to the nature of the lock, one of the two pathways will likely have to drop their data, to allow the other communications to continue. So, basically, the error recovery system will have to decide which data pathway to flush and which to allow to continue.

Using our clock design as an example, the following additional error should be added to the system-level design:

Soft Error
Fault: Display task state variable corruption.
Test: Range check on the state variable.
Response: Reset the state variable.

Recoverable Error
Fault: Button task state variable corruption.
Test: Range check on the state variable.
Response: Reset the state variable.
Cancel any current command semaphores.
Reset all debounce and autorepeat counter variables.

Recoverable Error
Fault: Time base task state variable corruption.
Test: Range check on the state variable.
Response: Reset the state variable.
Range check time base timer variables.
If out of range, then reset and notify error task to clear potential alarm fault.

Recoverable Error
Fault: Alarm control task state variable corruption.
Test: Range check on the state variable.
Response: Reset the state variable.
If alarm is active, disable then retest for alarm time.
If alarm enabled or active, range check alarm time.
If alarm time out of range, then notify error task of fault condition.

Soft Error
Fault: Alarm tone task state variable corruption.
Test: Range check on the state variable.
Response: Reset the state variable.

Recoverable Error
Fault: Error task state variable corruption.
Test: Range check on the state variable.
Response: Reset the state variable.
Check status on other system state machines.
If error condition, then set error system mode, set current time to default.
Wait for user control input.

Recoverable Error
Fault: Alarm disabled but also active.
Test: Routine check by error task.
Response: Reset alarm control task state variable.

Recoverable Error
Fault: Snooze active when alarm is disabled.
Test: Routine check by error task.
Response: Reset alarm control task state variable.

Hard Error
Fault: Program memory fails a CRC test.
Test: CRC check on power-up.
Response: System locks, with a blank display.

These additional fault conditions and recovery mechanisms are then added to the design notes. The description of the fault condition should include an appropriate, verbose description of the type of error condition, the error condition itself, the method for detection of the error, and the recovery systems.

Include notes on the placement of the new software functions to detect and correct the error condition, plus any options in the design that were discarded and the reasons why.

Notes concerning any additional software functions required to handle the error detection and recovery should also be added to the appropriate task descriptions so they can be included in the state machine design.

This includes both errors from the corruption of data variables and the corruption of the state variable for the task state machine. All notes concerning an Error task or tasks should also be added to the design notes.

This includes updates to the task list, the system data flow diagram and variable dictionary, timing calculations, and priority handling information. Remember to review any additions to the communications plan, for potential state lock conditions.

At this point, the design should include all of the system-level design information for the design. It may not be final, but it should be as complete as possible. Remember, the next level of the design will use this information as the basis for design, so the information from this level must be as complete as possible.

To recap, the information generated so far includes the following:

• The requirements document: Should be updated with all the current system information, including functions required for operation, communications and storage requirements, timing information, and priority information. It should also include detailed information concerning the user interface and finally, all information available on potential system errors, methods used to identify the error conditions, and methods for recovering from the errors.

• Information retrieved from the requirements document: Should include information concerning the following:

Task Information: This includes a list of all the functions the design will be required to perform, any information concerning algorithms used by the functions, and a descriptive write-up detailing the general flow of the functions.

Communication Information: This includes all information about the size and type of data, for internal communications between functions, external communications with off-system resources, and any significant temporary storage.

Also any information about event timing that is tied to the variables used, as well as the classification of the data storage as either static or dynamic, plus all rate information for dynamic variables. Both peak and average should also be included.

Timing Information: This includes not only the timing requirements for the individual tasks, but also the overall system timing, including both event-to-event and response-time timing. Should also include all timing tolerance information, as well as any exceptions to the timing requirements based on specific system modes.

Priority Information: This includes a detailed description of all system modes and the trigger events that change the system mode. Should also include the overall priorities for the system, changes in function priorities due to changes in the system mode, and the priorities within each task based on current activities.

• Documentation on the task definition phase of the system-level design: This should include descriptive names for the various new tasks in the system, what software functions have been grouped into the functions, and the reasons for combining or excluding the various software functions.

In the event that conflicting criteria recommend both combining and excluding a function, the reasoning behind the designer’s decision should also be included. The final documentation should also include the preliminary task list, plus any updates due to changes in subsequent areas of the system-level design.

• Documentation on the communications plan for the design: This should include all revisions of the system data-flow diagram, the preliminary variable list and all related documentation concerning protocol assignments, memory requirements, and timing information.

Special note should be made of any combination of pathways that can result in a state lock condition, and the reasons for not alleviating the problem through the assignment of a different protocol for one of the problem pathways.

• Documentation on the timing analysis for the system: This should include all calculations generated to determine the system tick, including both optimum and worst-case timing requirements. Reasons for the choice of system tick should be included, and any functions that are to be handled through an interrupt-based timing system.

For systems with unregulated timing, the reasons for the decision to use an unregulated system should be included, along with the plan for any timing critical functions. Finally, the tick itself should be documented along with the skip timer values for all tasks in the system.

• Documentation on the systems priorities: Include the updated priority list, using the task name generated in the task definition phase of the design. Note any tasks that combine lower priority and higher priority functions, and the new priority assigned to the task. Note all events that trigger a change in system mode and all information generated in the validation of the trigger event information.

• Documentation on the error detection and recovery system in the design: Particularly any new error conditions resulting from the task state machines, potential communications problems, and general data corruption possibilities.

One final note on documentation of the system-level design: in all the design decisions made at this level, some will require back annotation to earlier design notes and even the requirements document for the system. As a designer, please do not leave this to the last moment; there will always be something missed in the rush to release the documentation to the next level of the design.

As a general rule, keep a text editor open on the computer desktop to make notes concerning the design. A second instantiation holding the requirements document is also handy.

Bookmarks for tagging the main points of the design, such as task definition, communications, priorities, and timing make accessing the documents quick and help to organize the notes. If the notes are made as the information is found, then the information is fresh in the mind of the designer, and the notes will be more complete.

Conclusion: The importance of documentation
I know this sounds like a broken record, but remember that good documentation allows support designers to more readily take up the design with only minimal explanation for the designer. Good documentation also aids designers if they ever have to pick up the design in the future and rework all or part of the design.

And, good documentation will help the technical writers in the development of the manuals and troubleshooting guides for the system. So, there is a wealth of reasons for being accurate and verbose in the documentation of the design, both for the designers themselves and for any other engineers that may have to pick up the design in the future.

At this point in the design, it is also a good idea to go back through the design notes and organize the information into four main areas: task, communications, timing, and priorities.

The information in the design notes will be the basis for all of the design work, so spending a few hours at this point to clean it up and organize the data will be time well spent. Do save the original document under a different name in case information is lost in the translation and cleanup.

To read Part 1, go to Dissecting the requirements document
To read Part 2, go to Extracting commnications pathway requirements
To read Part 3, go toDeterming system timing requirements
To read Part 4, go toDefining the system level design
To read Part 5, go to Managing communications between tasks

Keith Curtis is Principal Applications Engineer in the Security, Microcontroller and Technology Development Division at Microchip Technology Inc. In this role, Keith develops training and reference designs for incorporating microcontrollers in intelligent power supply designs. Keith also sits on the PMBus development committee, and is chair of the PMBus development tools subcommittee.

This article by Keith Curtis is based on material from “Embedded Systems: World Class Design” edited by Jack Ganssle, used with permission from Newnes, a division of Elsevier. Copyright 2008. For more information about this title and other similar books, please visit www.elsevierdirect.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.