Using virtual system prototyping to evaluate VME hardware platform alternatives

Deepak Shankar, Mirabilis Design

May 15, 2007

Deepak Shankar, Mirabilis Design

Early performance analysis and virtual system prototyping provides a methodology and platform to evaluate architectures, processing requirements and module functionality within large systems such as industrial controllers, unmanned aerial vehicles and navigation systems.

This methodology allows the system engineer to start with an abstract concept and increase model fidelity and accuracy through successive levels of decomposition. Establishing a simulation platform allows for quick and accurate trade studies and spiral engineering enhancements over the program's lifecycle.

It also allows investigations into interoperability which lessens Total Overall Cost of the electronic systems. This is especially important for electronics because this system element faces obsolescence or enhancement replacement soonest.

The example in this article for illustrating the power of this approach uses VisualSim Performance Modeling software to conduct trade-off analysis on the architecture of the processing hardware platform and to select the best bus backplane, but many of the principles involved can be applied to almost any project.

In this case, the system prototype combines existing components available in the VisualSim model library to assemble the sensors, on-board multi-blade processing units, wireless channels and the operation of a ground vehicle.

These processing platforms and the wireless channels are connected together over a VME Bus. Each of the hardware platform processes messages from a number of sensors and transmits the results across the VME backbone through a common set of transmitters to ground vehicles.

Based on the analysis conducted by constructing a model of the proposed architecture and support for the different use scenarios, the optimal architecture was identified to be a hardware platform with a 6-board, 30 MHz processor, 66 MHz shared cache and 1 Mbps, four link downlink.

Output Results
The simulation model evaluates the maximum handling capacity of the processing units, impact of channel errors and speed on the latency, and performance of the VEM bus. The following metrics are evaluated:

1. End-to-end latency from the sensors to the ground vehicles. This is the time taken to retrieve data from the sensors, process the data, transmit it across the VME bus, over the Wireless channel and terminate at the ground vehicle (DCGS).

2. VME Bus Latency Histogram shows the variations of the latency from Hardware Platform that are Slaves on the VME Bus (VME and VME2) to the Wireless Transmitter Salve (VME3) in Figure 1 below.

3. Packet Histogram displays packet sizes transmitted across the VME bus.

4. VME Bus Throughput displays the Peak and Mean throughput on the VME bus.

5. Display Rejects plots the times when sensor messages were dropped at the Hardware Platform because of buffer overflow or lack of processing power.

6. Hardware Platform Statistics captures the statistics of the buffer occupancy, utilization, processing time at each processor, buffers, flash memory and disk.

7. VME Bus Statistics captures (a) the buffer occupancy and waiting time at the Slave and (b) the utilization of the bus controller.

Architecture
The system model consists of sensor generators, Hardware Platform, VME Slave, VME bus controller, Wireless channel, error checking and DCGS- Ground Vehicles. All of these are connected together on a single VME bus with a single controller.

Figure 1 Top-level VisualSim block diagram of the Unmanned Aerial Vehicle

Sensor Generator. This emulates the sensor data including acquisition rate, size, header information and sensor to Hardware Platform distance. A sensor generator template was created and each sensor had different parameters values. The parameter values being modified include size, inter-arrival time, processing cycles, delayed start and distance to Hardware Platform.

One Hardware Platform handles many sensors. In this model, there are 4 sensors in a network feeding into a single Hardware Platform. The Hardware Platform contains a RTOS that feeds in parallel to both a processor array and a flash memory.

The flash memory is a temporary buffer that writes into an archiving system. The processor array contains a number of processors running at a fixed speed. The results from the processor are written into a shared cache. The resulting messages are transmitted by the RTOS to the VME Slave.

Wireless Channel. The wireless channel is modeled as a multi-link channel with variable error probability. The channel also has an Ack-Nack to support re-transmission.

DCGS. The ground vehicle is a sink that receives a message and computes the latency.

VME Bus VME Slave. This models the queuing, request for bus resource, cable propagation delay, response time and the ability to broadcast.

VME Bus Controller. This does a simple arbitration according to the VME bus standard. Also, the latency across the controller is specified here.

Figure 2. Networks of Sensors and the Processing Array on the UAV Analysis

The initial analysis (Figure 2, above) is performed to characterize a rough architecture that will support the processing requirements for a fixed arrival of messages from the sensors. The Hardware Platform architecture is validated and adjusted for various arrival rates of the sensor traffic.

The inter-arrival rate of messages is set for an initial range of 0.0008 +/- 30% for each sensor. The rough architecture has 4 Sun SPARC processor boards of 20 MHz with 66 MHz Cache and a 1 Mbps channel.

Figure 3 Analysis Plots from UAV Simulation Performance Evaluation

In Figure 3, above you will see that the Hardware Platform starts dropping messages after 1.37 seconds of simulated operation and continues rejecting message until the end of the simulation. Also, the end-to-end delay is in a wide range from 1.25 seconds to 2.57 seconds.

The cache statistics indicate that there is no buffer overflow and the utilization is quite low. Thus, the Cache is not a bottleneck. The individual statistics for the 4 processor boards shows a buffer overflow indicating that the processing speed is extremely low.

The sensor messages are unevenly distributed to the different processor boards, with the usage ranging from 100% to 23%. Also, the rejection of the messages at the processor boards makes the VME bus under utilized.

To refine this architecture, a number of alternates exist- increase the number of processors, speed up the processors, modify the scheduling algorithm, increase the cache speed and pipeline, as opposed to parallel execution, of the four processor boards.

Evaluating the alternatives
In this model, we have tried the following- more number of faster processors, more number of processors at the same speed, increase the cache speed and increase the channel speed.

Case 1: We shall first increase the processor speed from 20 MHz to 50 MHz. This is done by changing a single parameter at the top-level of the model. This single parameter is linked to all the 4 processor boards. The rest of the parameters remain unchanged. The new latency histogram shows a narrower range of latency values at 1.25, 1.42 and 1.67 seconds.

All the sensor messages are processed and transmitted across the VME bus without any message being rejected. The processor and cache statistics indicates no buffer overflow. The processor utilization is now uniform across all the processor boards at around 43%. The VME controller utilization has increased from 10% to 36%.

There is a small buffering at the VME-2 (VME_Slave), thus indicating that data is arriving at a faster rate than the VME controller can handle. The buffers at the Slave prevent any loss of data but add some latency.
The mean VME bus throughput has now doubled for the same traffic from 0.16 Mbps to 0.41 Mbps. The peak throughput on the VME with this architecture is 0.72 Mbps, out of an available 1 Mbps, because of the protocol overhead and the controller latency.

Case 2: The next experiment is to increase the number of processors to 6 and reduce the speed to 30 MHz. The results indicate no significant performance improvement from the previous experiments. On the other hand there is a small increase in the buffering at the cache.

The shared cache is receiving data at a higher rate. Also, the average processor utilization is slightly higher than in Case 1 but it peaks out at 50%. The same volume of data is received at the VME bus controller and the mean utilization remains at 36%.

Case 3: Now let us reduce the number of processor to 4 and increase each processor speed to 30 MHz. There is still a buffer overload but also the peak latency increases to 3.8 seconds. This is not a viable option.

Case 4: Additional experiments can be performed by simply changing the parameters at the top-level of this model. Experiments include (1) increasing the channel speed, (2) increasing the number of channel links and (3) changing Cache speed from 66MHz to 133/288 MHz. Simulations show that these do not contribute to a reduction of the end-to-end latency. At this point the bottleneck is at the VME bus and not from the processing architecture.

Cost Comparison
The cost of a processor board is a function of the processor speed. For the sake of this analysis, we shall assume that the price is $1 per MHz.  Table 1 below shows the comparison:

Table 1 Cost Comparison of different architectures

Functional Analysis
After running this complex system, a very interesting observation was made. The burst nature of the handling from the sensors to the RTOS increases the buffering on the processor boards without increasing the utilization of the processor.

Even though the processors do not achieve 100% utilization, there are still a number of messages that are being rejected. There are a number of factors affecting the overall performance including the RTOS scheduling, the distribution between parallel executions and burst data arrival.

There are periods of inactivity followed by a burst of traffic that fills up the buffer and then starts to overflow. This can be modified by altering the sensor acquisition mechanism, which was beyond the scope of this evaluation. This could be easily added to this experiment as a future extension.

Summary
Trade-off analyses of this kind can help in analysis of the system performance for a variety of operating conditions. Early understanding of optimal performance can save significant prototype testing and deliver much more robust system operation.

The model for this presentation was built in a few hours using existing  library models. A more advanced model could consider effects of redundant processing and consolidation of functional modules on a single processing board.

Also, trade-off between the partitioning of the functional nodes from the UML diagram on to different boards and separate VME or VME bus structures can also be evaluated. Finally, channel interference and jamming can easily be included as refinements to further explore operational effects on performance.

The model has been constructed using highly modular components creating a design platform. The bus in this model can be easily replaced with PCI or Ethernet bus architecture to evaluate the performance on a different backplane. The channel could be modified to try a more unreliable channel or the use of a cellular standard. Future spiral engineering possibilities can be tried and quickly determined to be feasible or not.

Deepak Shankar is chief executive officer at Mirabilis Design  and has over 15 years experience in development, sales and marketing of system-level design tools. Prior to Mirabilis Design, Mr. Shankar was VP of Business Development at both MemCall, a fabless semiconductor company and SpinCircuit, a supply chain joint venture of HP, Cadence and Flextronics. Prior to that, Deepak spent many years in product marketing at Cadence Design Systems.

Loading comments...

Most Commented

  • Currently no items

Parts Search Datasheets.com

KNOWLEDGE CENTER