Eliminate risk in real-time software tasks using system-level digital simulation - Embedded.com

Eliminate risk in real-time software tasks using system-level digital simulation


Editor’s Note: The authors describe how to simulate two highly distributed real time software tasks on a multicore architecture.  They provide an executable web version so readers can visualize the software process setup and hardware definition and modify attributes of their own, run simulations, and view the resulting analysis.

Simulating the data and control flow software tasks and threads can improve response time and enhance reliability of real-time applications. To gain maximum advantage, the software simulation must be performed during the specification phase and needs to include power and performance attributes.

The simulation of the software process will deliver a quality product specification that meets the timing requirements and will be within the resource budgets. To achieve this potential and deliver quality software systems, the effort must be effectively tied with good quality timing and the hardware architecture specification.

The impact of missing timing deadlines or incorrect response to failure can be catastrophic to the embedded and real-time system. In this article, we simulate two highly distributed applications on a multicore architecture and optimize the performance, reliability and power consumption.

The growing concern for safety and security requires the software tasks to be tested for many forms of data, network, and functional errors. Testing with implemented software is extremely complex because of the huge prototype cost and the inability to replicate the errors. Moreover, changes identified after implementation will delay the delivery cycle, create potential malfunctions, and affect product quality. Early simulation can identify a number of implementation errors while identifying issues that would need to be tested.

Industry challenges addressed
Application of hardware-enabled software process is a major consideration in the aerospace, defense, and automotive industries because of the need for safety, reliability and adequate protection against failures. A hardware-enabled software task is the ability to visualize the operation of the software tasks on a target hardware platform prior to development of the software code. Current approaches are limited to analytical techniques. These provide either a rough estimate or a worst case/best case timing, neither of which take into consideration realistic system operation.

An alternate approach to this evaluation of software processes is the use of digital simulation to model the software processes as a series of control and data-flow that are mapped on hardware topologies. Good representative patterns of the traffic stimuli are applied to this model and the corresponding response times, power consumed, and correctness of the expected output is measured. These reports are used as constraints for downstream implementation. As this analysis is performed prior to the development of any implementation flow definition, the constraints become the validation metrics for software testing.

This important software thread design will meet the performance (latency and throughout) and power deadlines. In this effort, the hardware platform is represented as a combination of the communication topology, computation technologies, and the scheduling algorithm. The faults introduced for testing include  late arrival of sensor data at the interface, modified data at the memory location, overwritten schedule table, and wrong sequence of task sequence.

Simulation technology
In our analysis of real-time software at CMR Design Automation, we’ve used the commercial system-level design software VisualSim Architect , which is a modeling and simulation environment with a graphical block diagram editor, simulators, and model construction libraries, including those for representing the hardware, schedulers, and software processes.

When a simulation has completed, our designers receive a series of graphical and textual reports that trace the software tasks, the application response times, hardware utilization, power consumption, and cumulative activity. This has been used at various companies to architect the hardware based on the activity of software processes, threads, memory assignment, and IO activity.

In our analysis of the software applications, we define three activities per task – get data, execute task, and write data. Each thread or process is assigned a unique identifier for tracing purpose.

Defining a software task
A single software task or thread has the following sequence:

  • The scheduler triggers the software task using an interrupt, data arrival or a periodic event.
  • This trigger sends the request to a thread queue, which assigns the task to a slot or a core of the processor.
  • When the scheduler gets access to the processor, the first action would be access data. To access the data in memory, the flow sequence will be Bus –> DRAM –> Bus –> CPU, with a data size and a data triggering frequency which can be periodic or event-triggered.
  • When the data becomes available, the processor executes the instruction sequence.
  • When the processing has been completed, the data is written back to memory. The flow sequence would be Bus –> DRAM.

These attributes define the software process modeling and constitutes the data-flow between devices on board.

Figure 1: Software Scheduling on multicore architecture

If we look at the hardware, it is defined as scheduler + queue + processing resource. For more accurate or detailed processing, we can add a memory request cycle and an exact arbitration algorithm. The hardware has power definition, which provides instant and average power  measurements for each device, application and the entire system, as well.

System definition: hardware platform and software process
Currently, if there is only one application or if the application is fully pipelined with no contention for resources, then a spreadsheet can be used to compute the throughput and latency. The application can be statically scheduled and the arbitration will be straightforward.

However, most of today’s systems contain multiple applications and multiple data flows through the system, creating contention for resources such as processors, memories, and I/O’s. It is difficult to predict the system performance without constructing a model that combines both hardware and software with all flows.

Figure 2: Mapping of threads on hardware platform

Figure 2 shows our hardware platform, a multi-processor embedded system with two CPUs, a DSP, and an I/O. The tasks on top represent one application and the data flow is I/O –>CPU1 –> DSP –> I/O, while the Application at the bottom has a data flow of I/O –> CPU1 –> DSP –> I/O. In this model, the I/O and DSP are common processing resources for both the applications. Bus1, Bus2, and Bridge are used by both tasks for transferring data between the executing resources. The rate of data arrival at the I/O, resource processing time, data size, and task priority can be varied for each simulation.

Figure 3: VisualSim Architect model of the hardware platform

The hardware platform model is shown in Figure 3.  All components of the system are modeled using schedulers for fast simulation performance. Schedulers represent processing resources and include the definition for buffering, scheduling, preemption, priority reordering, processing delay, and context switching times. Each scheduler has built-in statistics and multiple power modes. Here the two CPUs execute at 400MHz, the DSP at 600 MHz, I/O at 120 MHz, and two intercommunication buses Bus1 and Bus2 are 400 MHz and 200 MHz respectively.

Figure 4: Software Task flow in VisualSim Architect

Software task distribution across the hardware architecture is shown in Figure 4. Here we modeled two application scenarios on the hardware architecture – App1 and App2. The application sequence of App1 is:

I/OàBus1 –> CPU1 –> Bus1 –> Bridge –> Bus2 –> DSP –> Bus2 –> Bridge –> Bus –> I/O

The Application sequence of App2 is:

I/O –> Bus1 –> Bridge –> Bus2 –> CPU2 –> Bus2 –> DSP –> Bus2 –> Bridge –> Bus –> I/O

The application sequence is modeled using mappers in VisualSim, and it plays a very important role in defining the behavior flow for your system. Mappers collect the information about the task and communicate it to the scheduler in the Hardware Architecture platform. Here each mapper is virtually connected to the respective schedulers and these mappers take the incoming data structure and send it to the schedulers along with the information in the mapper parameter fields.

The arbitration algorithms have been defined using a C-like scripting language. The stimulus to the software process flow is provided by a constant rate and a variable rate traffic generator .Experiments on our model
As part of the model constructed, we parameterized a number of variables including the clock speed of each processing device, processing time for each task in the process, and the scheduling rate of the two applications. The following are the list of parameters.

Hardware Platform (All values are in Mhz)
   DSP_Clk_Speed : 600.0
   CPU2_Clk_Speed : 400.0
   Bus2_Clk_Speed : 400.0
   Bridge_Clk_Speed : 200.0
   Bus1_Clk_Speed : 400.0
   IO_Clk_Speed : 120.0
   CPU1_Clk_Speed : 400.0

Software Process
   App1_Exec : {“IO”,”CPU1″,”DSP”,”IO”}
   App2_Exec : {“IO”,”DSP”,”CPU2″,”IO”}
   App1_Time : {14,220,100,12}
   App2_Time : {17,67,120,12}

Scheduling Time
   Value1 in the Source block: 1.0/1.70e6 (Rate of App1)
   Value1 in the Source2 block: 1.0/1.850e6 (Rate of App2)

Power Metrics (As a multiple of the hardware clock speed)

Power_Active : {CPU1_Clk_Speed*0.25, CPU2_Clk_Speed*0.25, DSP_Clk_Speed*0.5, Bus1_Clk_Speed*0.1, Bus2_Clk_Speed*0.1}
Power_Standby : {CPU1_Clk_Speed*0.1, CPU2_Clk_Speed*0.1, DSP_Clk_Speed*0.05, Bus1_Clk_Speed*0.01, Bus2_Clk_Speed*0.01}

Analysis and results
Using the model, we studied the response time for each application and the instantaneous power consumed by the hardware platform. The simulation was executed for a period of 100 ms, after it was determined that a longer run was not required. As part of our studies, we found that the maximum impact occurred when the bus speed was modified. We ran the buses at 200 MHz and 400 MHz. We now compared the results from the two runs. The two plots that were the most interesting were the App2 response time and the power consumption.

The App2 response time is displayed in Figure 5 . The response time is measured as the time when the task enters the I/O devices through the buses, execution, and returns to the I/O device. The red color dots indicate the latency when the buses are running at 200 MHz while the blue dots indicate the performance when the buses are at 400 MHz.

You will notice that the 400 MHz response time was mostly in the range of 1.0 to 1.2 us. The scale on the Y-axis is 1.0e-6. The similar range for the 200 MHz is between 1.3 and 1.45 us. The difference does not indicate a 50% reduction in the response time, rather a much more modest 7%. The striking finding is that there were times in the run during which the 400 MHz bus produced a longer response time than the 200 MHz bus.

This unpredictable response time is characteristics of all system-level simulations. This unexpected behavior can lead to delayed execution, increasing queuing at processing resources, and would be hard to detect in a physical system. As the condition has been identified through software process modeling, it can be accommodated by providing an additional band-gap for the task to complete execution.

Figure 5: Software task latency output from VisualSim Architect

In the case of power, the difference was minimal. The faster bus caused the system to spike at certain execution points. The interesting thing is that it does not spike all the time, indicating that the user can focus on cooling the system at the lower 2.5 W. Figure 6 shows the power for the 400 MHz and Figure 7 shows the power for the 200 MHz.

Figure 6: Power consumption chart for the 200 MHz Buses from VisualSim Architect

Figure 7: Power consumption chart for the 400 MHz Buses from VisualSim Architect

Modeling and simulation of software threads in the presence of hardware architecture provides a 360 degree view of the software operation. Unlike analytical methods, the digital simulation approach provides a realistic view of the expected performance and power metrics. The above research can be expanded to consider the impact of cache and memory accesses, and power management algorithms. As can be seen from this analysis, a number of unpredictable behaviors can be easily identified with system-level modeling of the software processes.

It is best to conduct these studies alongside the specification process for best results, something you can determine on your own if you have conducted your own experiments at the modeling page we have made available on line. It is best to conduct these studies alongside the specification process.

Ranjith K R is an EDA Application engineer, specializing in VisualSim system-level products at CMR Design Automation in Bengaluru, India. He has many years of expertise in System level modeling, simulation and development. Mr. Ranjith has been involved in various system level model development projects with the defense sector, aerospace corporations and multinational semiconductor companies in India. Mr. Ranjith has completed an MS in Electronics from Kuvempu University and a Diploma in FPGA design and verification.

Deepak Shankar is the Founder of Mirabilis Design, a systems engineering software solutions provider. Mr. Shankar has been involved with architecture exploration of embedded systems, semiconductors and real-time software for over 20 years. While at Mirabilis Design, he has developed new methodologies and solutions to streamline the validation of system specification, make architecture exploration extremely accurate and accelerate the systems engineering process. Prior to Mirabilis Design, Deepak Shankar has worked at Cadence, Spincircuit and Memcall in technical, marketing and executive management roles. Mr. Shankar has published over 40 articles in technical journals around the world and has been the lead speaker at various IEEE and other Organizations. Mr. Shankar has a MS in Electronics from Clemson University, MBA from University of California Berkeley and a BS in Electronics and Communication from Coimbatore Institute of Technology.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.