Early performance analysis and virtual system prototyping provides a methodology and platform to evaluatearchitectures, processing requirements and module functionality withinlarge systems such as industrial controllers, unmanned aerial vehiclesand navigation systems.
This methodology allows the system engineer to start with anabstract concept and increase model fidelity and accuracy throughsuccessive levels of decomposition. Establishing a simulation platform allows for quickand accurate trade studies and spiral engineering enhancements over theprogram's lifecycle.
It also allows investigations into interoperability which lessensTotal Overall Cost of the electronic systems. This is especiallyimportant for electronics because this system element facesobsolescence or enhancement replacement soonest.
The example in this article for illustrating the power of thisapproach uses VisualSimPerformance Modeling softwareto conduct trade-off analysis on the architecture of the processinghardware platform and to select the best bus backplane, but many of theprinciples involved can be applied to almost any project.
In this case, the system prototype combines existing componentsavailable in the VisualSim model library to assemble the sensors,on-board multi-blade processing units, wireless channels and theoperation of a ground vehicle.
These processing platforms and the wireless channels are connectedtogether over a VME Bus . Each of the hardwareplatform processes messages from a number of sensors and transmits theresults across the VME backbone through a common set of transmitters toground vehicles.
Based on the analysis conducted by constructing a model of theproposed architecture and support for the different use scenarios, theoptimal architecture was identified to be a hardware platform with a6-board, 30 MHz processor, 66 MHz shared cache and 1 Mbps, four linkdownlink.
The simulation model evaluates the maximum handling capacity of theprocessing units, impact of channel errors and speed on the latency,and performance of the VEM bus. The following metrics are evaluated:
1. End-to-endlatency from the sensors to the ground vehicles . This is thetime taken to retrieve data from the sensors, process the data,transmit it across the VME bus, over the Wireless channel and terminateat the ground vehicle (DCGS).
2. VME BusLatency Histogram shows the variations of the latency fromHardware Platform that are Slaves on the VME Bus (VME and VME2) to theWireless Transmitter Salve (VME3) in Figure1 below .
3. PacketHistogram displays packet sizes transmitted across the VME bus.
4. VME BusThroughput displays the Peak and Mean throughput on the VME bus.
5. DisplayRejects plots the times when sensor messages were dropped atthe Hardware Platform because of buffer overflow or lack of processingpower.
6. HardwarePlatform Statistics captures the statistics of the bufferoccupancy, utilization, processing time at each processor, buffers,flash memory and disk.
7. VME BusStatistics captures (a) the buffer occupancy and waiting time atthe Slave and (b) the utilization of the bus controller.
The system model consists of sensor generators, Hardware Platform, VMESlave, VME bus controller, Wireless channel, error checking and DCGS-Ground Vehicles. All of these are connected together on a single VMEbus with a single controller.
|Figure1 Top-level VisualSim block diagram of the Unmanned Aerial Vehicle|
SensorGenerator. This emulates the sensor data including acquisitionrate, size, header information and sensor to Hardware Platformdistance. A sensor generator template was created and each sensor haddifferent parameters values. The parameter values being modifiedinclude size, inter-arrival time, processing cycles, delayed start anddistance to Hardware Platform.
One HardwarePlatform handles many sensors. In this model, there are 4sensors in a network feeding into a single Hardware Platform. TheHardware Platform contains a RTOS that feeds in parallel to both aprocessor array and a flash memory.
The flash memory is a temporary buffer that writes into an archivingsystem. The processor array contains a number of processors running ata fixed speed. The results from the processor are written into a sharedcache. The resulting messages are transmitted by the RTOS to the VMESlave.
WirelessChannel. The wireless channel is modeled as a multi-link channelwith variable error probability. The channel also has an Ack-Nack tosupport re-transmission.
DCGS. The ground vehicle is a sink that receives a message and computes thelatency.
VME Bus VMESlave. This models the queuing, request for bus resource, cablepropagation delay, response time and the ability to broadcast.
VME BusController. This does a simple arbitration according to the VMEbus standard. Also, the latency across the controller is specifiedhere.
|Figure2. Networks of Sensors and the Processing Array on the UAV Analysis|
The initial analysis (Figure 2,above ) is performed to characterize a rough architecture thatwill support the processing requirements for a fixed arrival ofmessages from the sensors. The Hardware Platform architecture isvalidated and adjusted for various arrival rates of the sensor traffic.
The inter-arrival rate of messages is set for an initial range of0.0008 +/- 30% for each sensor. The rough architecture has 4 Sun SPARCprocessor boards of 20 MHz with 66 MHz Cache and a 1 Mbps channel.
|Figure3 Analysis Plots from UAV Simulation Performance Evaluation|
In Figure 3, above you willsee that the Hardware Platform starts dropping messages after 1.37seconds of simulated operation and continues rejecting message untilthe end of the simulation. Also, the end-to-end delay is in a widerange from 1.25 seconds to 2.57 seconds.
The cache statistics indicate that there is no buffer overflow andthe utilization is quite low. Thus, the Cache is not a bottleneck. Theindividual statistics for the 4 processor boards shows a bufferoverflow indicating that the processing speed is extremely low.
The sensor messages are unevenly distributed to the differentprocessor boards, with the usage ranging from 100% to 23%. Also, therejection of the messages at the processor boards makes the VME busunder utilized.
To refine this architecture, a number of alternates exist- increasethe number of processors, speed up the processors, modify thescheduling algorithm, increase the cache speed and pipeline, as opposedto parallel execution, of the four processor boards.
Evaluating the alternatives
In this model, we have tried the following- more number of fasterprocessors, more number of processors at the same speed, increase thecache speed and increase the channel speed.
Case 1: We shall first increase the processor speed from 20 MHz to 50 MHz. Thisis done by changing a single parameter at the top-level of the model.This single parameter is linked to all the 4 processor boards. The restof the parameters remain unchanged. The new latency histogram shows anarrower range of latency values at 1.25, 1.42 and 1.67 seconds.
All the sensor messages are processed and transmitted across the VMEbus without any message being rejected. The processor and cachestatistics indicates no buffer overflow. The processor utilization isnow uniform across all the processor boards at around 43%. The VMEcontroller utilization has increased from 10% to 36%.
There is a small buffering at the VME-2 (VME_Slave), thus indicatingthat data is arriving at a faster rate than the VME controller canhandle. The buffers at the Slave prevent any loss of data but add somelatency.
The mean VME bus throughput has now doubled for the same traffic from0.16 Mbps to 0.41 Mbps. The peak throughput on the VME with thisarchitecture is 0.72 Mbps, out of an available 1 Mbps, because of theprotocol overhead and the controller latency.
Case 2: Thenext experiment is to increase the number of processors to 6 and reducethe speed to 30 MHz. The results indicate no significant performanceimprovement from the previous experiments. On the other hand there is asmall increase in the buffering at the cache.
The shared cache is receiving data at a higher rate. Also, theaverage processor utilization is slightly higher than in Case 1 but itpeaks out at 50%. The same volume of data is received at the VME buscontroller and the mean utilization remains at 36%.
Case 3: Now let us reduce the number of processor to 4 and increase eachprocessor speed to 30 MHz. There is still a buffer overload but alsothe peak latency increases to 3.8 seconds. This is not a viable option.
Case 4: Additional experiments can be performed by simply changing theparameters at the top-level of this model. Experiments include (1)increasing the channel speed, (2) increasing the number of channellinks and (3) changing Cache speed from 66MHz to 133/288 MHz.Simulations show that these do not contribute to a reduction of theend-to-end latency. At this point the bottleneck is at the VME bus andnot from the processing architecture.
The cost of a processor board is a function of the processor speed. Forthe sake of this analysis, we shall assume that the price is $1 perMHz. Table 1 below showsthe comparison:
|Table1 Cost Comparison of different architectures|
After running this complex system, a very interesting observation wasmade. The burst nature of the handling from the sensors to the RTOSincreases the buffering on the processor boards without increasing theutilization of the processor.
Even though the processors do not achieve 100% utilization, thereare still a number of messages that are being rejected. There are anumber of factors affecting the overall performance including the RTOSscheduling, the distribution between parallel executions and burst dataarrival.
There are periods of inactivity followed by a burst of traffic thatfills up the buffer and then starts to overflow. This can be modifiedby altering the sensor acquisition mechanism, which was beyond thescope of this evaluation. This could be easily added to this experimentas a future extension.
Trade-off analyses of this kind can help in analysis of the systemperformance for a variety of operating conditions. Early understandingof optimal performance can save significant prototype testing anddeliver much more robust system operation.
The model for this presentation was built in a few hours using existing library models. A more advanced model could consider effects ofredundant processing and consolidation of functional modules on asingle processing board.
Also, trade-off between the partitioning of the functional nodesfrom the UML diagram on to different boards and separate VME or VME busstructures can also be evaluated. Finally, channel interference andjamming can easily be included as refinements to further exploreoperational effects on performance.
The model has been constructed using highly modular componentscreating a design platform. The bus in this model can be easilyreplaced with PCI or Ethernet bus architecture to evaluate theperformance on a different backplane. The channel could be modified totry a more unreliable channel or the use of a cellular standard. Futurespiral engineering possibilities can be tried and quickly determined tobe feasible or not.
Deepak Shankar is chief executiveofficer at Mirabilis Design and has over 15 years experience in development, sales and marketing ofsystem-level design tools. Prior to Mirabilis Design, Mr. Shankar wasVP of Business Development at both MemCall, a fabless semiconductorcompany and SpinCircuit, a supply chain joint venture of HP, Cadenceand Flextronics. Prior to that, Deepak spent many years in productmarketing at Cadence Design Systems.