A SystemC-Based RTOS Model for Multiprocessor Systems-on-Chips: Part 3 - Multiprocessor Systems - Embedded.com

A SystemC-Based RTOS Model for Multiprocessor Systems-on-Chips: Part 3 – Multiprocessor Systems

In this final part in this series, we extend our model for uniprocessorscheduling, described in Part 1and Part 2, to handlemultiprocessor scheduling. As for uniprocessor scheduling, the primaryobjective is to ensure that all deadlines are met. However, in the caseof multiprocessor systems, validatingthat the hard timing constraints are met is extremely difficult.

In the following discussion, we assume that each processor has itsown scheduler, which uses a uniprocessor scheduling algorithm, and thatthe schedulers on different processors may use different schedulingalgorithms. Based on this assumption, we identify the following twoimportant issues in multiprocessor scheduling:

1) Taskassignment. Most real-time systems are static in the sense thattasks are partitioned and statically bound to processors.

2)Interprocessor synchronization . This ensures that the precedencerelations between the tasks running on different processors are alwayssatisfied. Task assignment is often done off-line, i.e., at compiletime, and is based on a detailed analysis of the application and thetarget architecture.

The assignment may be based on one or more criteria, such asexecution time, resource requirements, data dependencies, timingconstraints, communication cost, and so on. In our model, all resourceallocations are local, and remote resource requests are handled by theproper application of task assignment and interprocessorsynchronization policies.


Example 3. In this example, we illustrate the capabilitiesof our framework by analyzing a small multiprocessor example. Figure 10-11a above shows a systemwith two processors, running two tasks each. A single dependency existsbetween the tasks t2 and t3 , i.e., t2 = t3 . As seen in Figure10-11, itis an interprocessor dependency. The figure also shows thecharacterization of each of the tasks.

Figure 10-11b shows how we model the example using our abstractRTOS model.

Note that, in this example, we have omitted the resourceallocator.In the first approach, we use RMS as the scheduling policy for theRTOS on both processors. Figure10-12  below shows the behavior of each task in thesystem, giving information about the release times and the deadlines.


As t1 has a shorter period than t2 ,it has,according to the RMS policy, a higher priority and, hence, startsexecuting at time 0. After 2 time units, t1 has completedits execution well ahead of its deadline. t2 can then rununtil it completes after 2 time units at time 4.

Due to the dependency between t2 and t3 ,t3 hasto waituntil time 4, which is the same time that task t4 is released. As t3has higher priority than t4 , t3 startsexecuting at time 4. Having a period of 6 time units and a delay of 2time units, t3 finishes just in time.

At time 6, t3 finishes and t4 starts its execution, which should run for 3 time units and end beforeits deadline. However, at time 6, t2 starts itssecond iteration, which ends at time 8, immediately releasing t3 .Since t3 has higherpriority than t4 ,t4 ispreempted at time 8 and cannot resume execution before t3 has finishedat time 10, effectively missing its deadline by 1 time unit.


If the scheduling policy of the RTOS on processor PEb is changed toa non-preemptive earliest deadline first (EDF) scheduling, t4 willcomplete its first execution at time 9, as shown in Figure 10-13 above, delaying thesecond execution of t3 by1 timeunit. In this scenario, all four tasks meet their deadlines. Althoughthis is a very simple example, it demonstrates the modelingcapabilities of our framework.

Multiprocessing Anomalies
An interesting aspect of multiprocessor scheduling that presentscomplications is that of anomalies. Let us consider a set of tasks thatis optimally assigned and scheduled on a multiprocessor system that hasa fixed number of processors, a fixed execution time for each task, andcontains precedence constraints. Then, implementing any of thefollowing intuitive “improvements”

1) Changing the prioritylist of the scheduler
2) Increasing the number ofprocessors in the architecture
3) Reducing the execution timeof one or more tasks
4) Weakening the precedenceconstraints

may increase the scheduling length! Hence, most on-line schedulingalgorithms are subject to experience anomalies, as tasks may completebefore their WCET.

Example4. Consider a task set of five tasks, { t1 ,t2 ,t3 , t4 , t5 } assigned to two processing elements,{PEa ,PEb } such that

Let the dependency relations be asfollows, t1 < t2 < t3 < t4 < t5 . Furthermore, let t2 and t4 share the sameresource, Dx ,i.e., r(t2 ) = r(t4 ) ={Dx }.

Figure 10-14abelow shows an optimalschedule for the task set. Assume that we are able to find a betterimplementation of t1 suchthat its execution speed is doubled. Intuitively, we would assumebetter system performance; however, as illustrated in Figure 10-14b,the latency is actually extended from 12 to 15 due to the mutualexclusion of t2 andt4 .


Interprocessor Communication
Interprocessor communication overhead is a major factor that limits theperformance of multiprocessor systems. All multiprocessor systems, nomatter how powerful their interprocessor communication mechanism,suffer from this overhead. Communication overhead is exacerbated by theresource contention in the interprocessor communication network, whichcomprises the node and the link contentions.

Node contention arises when a node attempts to transmit or receiveseveral messages simultaneously. Link contention is caused by thesharing of a communication link by two or more messages. The networkresource contention arises in all but the simplest communicationrequirements. It can be minimized or, in some cases, eliminated, byproper scheduling of messages.

However, apart from incurring scheduling overhead, this requiressynchronization among all the processors in a multiprocessor system,thereby incurring synchronization overhead. The mapping of anapplication on a multiprocessor system thus has to be optimized tosatisfy the following conflicting requirements:

1) A completelycontention-free schedule will incur substantial synchronizationoverhead.

2) A completelysynchronization-free schedule will result in heavy contention overhead.

Clearly, there is a need to find a balance between the two types ofoverhead in order to minimize the overall execution time of theapplication running on a multiprocessor system.

In our multiprocessor modeling framework, instead of explicitlyincluding the interprocessor communication costs, we model theinterprocessor communication network as a communication processor onwhich message transmission tasks are scheduled non-preemptively on afixed-priority basis.

In this way, the interprocessor communication costs are implicitlytaken into account by the response times of the message transmissiontasks scheduled on the communication processor.

We model a communication event within an interprocessor network as amessage task ™ that executes on the communication processor. Whenone PE wants to communicate with another PE, a tm is released forexecution on the communication processor.

Each tm represents communication only between a fixed set of twopredetermined PEs. Since an interprocessor communication networksupports concurrent communication, the tms are synchronized, allocatednetwork resources, and scheduled accordingly.

Therefore, in our model, the network allocator reflects thetopology, and the network scheduler reflects the interprocessorcommunication protocol. A resource database, which is unique to eachnetwork, contains information about all its resources (nodes, links,buffers, and so on). The network allocation and scheduling algorithmsmap a tm onto the available network resources.

From the implementation point of view, our interprocessorcommunication model has exactly the same structure as the abstract RTOSmodel described earlier but with some necessary modifications to itsconstituent module blocks.

Multiprocessor Example
Here we demonstrate the capabilities of our system-level multiprocessorSoC modeling framework by presenting the simulation results of apractical example derived from a real-life application of an MP3Decoder.


The preprocessing steps for abstracting the application code, likethe extraction of task graph parameters through code profiling, andmapping the task graphs to the SoC architecture have been performedmanually. The resulting task graph is shown in Figure 10-15 above , and thecharacterization of each task is listed in Table 10-2 below .


Initially, the task graph for the MP3 Decoder is mapped on auniprocessor architecture and simulated using the modeling framework.The simulation results reveal that some tasks miss their deadlines.Therefore, the application task graph is task-partitioned and mapped ona multiprocessor SoC platform.

The partitioned task graph is shown in Figure10-15 above . This partitionedtask graph is mapped on an architecture comprising two general-purposeprocessors (PEa and PEb) interconnected by a bus. The mapping resultsin the exchange of communication messages among the two PEs over thebus. The simulation results are shown in Figure 10-16 below .


In this series we have described an abstract modeling framework basedon SystemC that supports the modeling of multiprocessor-based RTOS. Theaim of the framework is to provide the system designer with auser-friendly and efficient modeling and simulation environment inwhich one can experiment with different RTOS policies and study theconsequences of local decisions on the global system behavior.

An RTOS is modeled as a combination of three interacting modulesdefining basic RTOS services: scheduling, synchronization, and resourceallocation. Although the three services are interdependent, theseparation of concerns allows the system designer to mix differentdesign implementations easily in order to develop an efficientdedicated RTOS.

The model presented supports periodic, aperiodic, and sporadictasks, data dependencies among tasks, task offset, context switching,and dynamically determined execution times (based on a uniformdistribution of execution times between the BCET and WCET limits).

The composition principle of our RTOS model allows for variousmultiprocessor models, including centralized RTOS, distributed RTOS,and task migration in the case of a pool of tasks supported by morethan one RTOS. In addition, the effect of interprocessor communicationcan also be effectively modeled.

To read Part 1, go to: Basicconcepts and terminology
To read Part 2, go to UniprocessorSystems.

This series of articles is based oncopyrighted material submitted by Jan Masden, Kashif Virk and MercuryJair Gonzalez to “MultiprocessorSystems-On-Chips,” edited by Wayne Wolf and Ahmed Amine Jerraya. Itis used with the permission of the publisher, Morgan Kaufmann, animprint of Elsevier. The book can be purchased on-line.

Ahmed Jerraya is researchdirector with CNRS and is currently managing research on multiprocessorsystem-on-chips at TIMA Laboratory in France. Wayne Wolf is currently the GeorgiaResearch Alliance Eminent Scholar holding the Rhesa “Ray” S. Farmer,Jr. Distinguished Chair in Embedded Computer Systems at Georgia Tech'sSchool of Electrical and Computer Engineering (ECE). Previously aprofessor of electrical engineering at Princeton University, he workedat AT&T Bell Laboratories.

JanMadsen is professor of computer based systems at the Departmentof Informatics and Mathematical Modeling at the Technical University ofDenmark where he heads the System-On-Chip Design Group.

KashifVirk works for Jan Masden in preparation for a doctorate. Hehas a B.Sc. in Electrical Engineering from the University ofEngineering and Technology, Lahore, Pakistan and an M.Sc. from theTechnical University of Denmark.

Mercury JairGonzalez is a design engineerat the Semiconductor Technology Center of CINVESTAV in Mexico. Hisresearch interests are VLSI design methods, MP SoC design methods, andthe timing analysis of real-time systems.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.