Every time the conversation on performance analysis and architecture exploration crops up, the questions turns to ISS or Instruction Set Simulator.‘Do you have the ISS for XYZ processor?‘ This leads to a discussion on what is an ISS suitable for.
Many EDA companies have developed ISSs, with the false promise of solving everything from software debugging and verifying the hardware, to auto-generating a board with all the peripherals pre-loaded. This gains an impression that the ISS is the solution for all your system development needs.
In reality, architectural exploration is an innovative choice to obtain results faster with quality results. An Instruction Set Simulator provides the user with the ability to load the Operating System and execute the compiled code.
This is a good solution for early software debugging.It is not a good solution when you are experimenting or trying out new architectures such as a new bus topology, different memory hierarchy, or processor clock speed sizing. Moreover the OS and the executable are tied to one processor family.If you want to evaluate another processor family, or a processor with a different set of peripherals, you need to get a new ISS and recompile the entire code.Moreover, there is a significant lag between the processor release and the ISS availability.An alternate to an ISS is imperative code execution for architecture exploration.
Code execution presents a very specific sequence of instructions executing in a mostly repetitive fashion.In Table 1, one will see the instructions that execute for an inverse Fast Fourier transform of OFDM communication software. Notice how it is made up of a series of floating point add and a few branch instructions.If one is evaluating the architecture of a system, the first 20 instructions of this sequence are typically sufficient.The code has been written by one software engineer based on one standard.If the second part of this code is written to another standard with an entirely different structure, the instruction sequence will be slightly different.
Table 1: Instruction sequence for an inverse Fast Fourier transform
ISS and executable software have been demonstrated to be ineffective in evaluating the metrics of an optimal architecture.1000 lines of code will simply repeat a small set of instructions, thus providing very little variability.Table 1 shows the instruction sequence for a 1000 lines of code.This is especially true for DSP and data flow code.Control logic has little more diversity but still the sequence is not different.
A number of good alternatives exist to emulate software operation for architecture exploration.System modeling experience shows that three types of software modeling work quite well.At a statistical-level, a delay value for each function is sufficient to trigger the traffic on the bus and the memory devices.
Atthe hardware-level, an application-specific instruction allocationcalled instruction-mix table provides an extremely accuraterepresentation of a software task.The last method is to annotate performance-intensive portions of the code and generate instruction trace during execution.This last technique is good to test the architecture behavior for a benchmark or set of benchmarks. This is also good to evaluate how a piece of code will behave in a multi-core environment.
The first approach requires a table with the name of the task and the associated delay.Duringexecution, the processor model does a table lookup and based on thetask (A_Task_Name in Table 2 ) from the RTOS delays the processor basedon the number and type of instructions in the task.
Table 2: Instruction mix table for a software task
Theapplication-specific instruction allocation technique is the mostversatile and can be used for software testing, hardware verificationand architecture optimization.As shown in Table 2, eachsoftware task or thread has a number of instructions and percentage ofdifferent types of instructions. In the case of My_Task_1, we have 10%of integer, 48% floating point, 10% logical, 7% load-store, and 25%brand instructions.This table is fed into a software generator block that generates the instruction sequence based on an intelligent algorithm.This sequence is used for the hardware testing, thus providing a more realistic test of the platform architecture.
Table 3 shows the output for My_Task_1.Toget an accurate distribution of the instruction type within a codestructure, use a good decompiler such as Hey-Ray, Intel Vtunes orboomerang.The number of tasks or threads will differ based on the application.Gettingthis amount of flexible instruction sequence to simulate is hard toachieve using an ISS but fairly easy using a good software generator.
Moreover, you can run the tasks in order, random order or based on the input request.This mechanism can provide a lot more variety in terms of cache access, hit-miss ratio, bus activity and pipelines flushes.One can modify the task instruction mix and study the impact on your architecture by simply modifying the percentage table.This is quick to do and is not locked to a specific code implementation.Moreoverthe variety allows for a much larger level of architecture testing. Ifyou look at the generated out for My_Task_1, you can see diversity inthe instruction sequence, allowing for a much larger level of testing.
Table 3: Instruction sequence output associated with the first line of the instruction mix table
To view and simulate a model that uses this application-specific instruction mix table, go to http://www.mirabilisdesign.com/new/software/demo/Partitioning/SoC/Power_Perf.htm.Accept all security warnings and the model will load up in the Web Page as an Applet.You can click on the GO to run the simulation.Similarly you can change a parameter in the model view and click on GO.You will see the changes in the reports.
A recent TechTalk at http://youtu.be/_csv53LlXp8 by Robert Juliano Ph.D., Sr. Director of Applications, Mirabilis Design covers a similar topic.
The instruction-mix table method of software emulation offers the most advantages for architecture exploration.Usingthis approach, the designer can view the depth of the pipeline,identify the cause of a stall, power management algorithm impact, memoryhierarchy operation, performance slowdown of load/store requests, andcache coherency algorithm quality. The simulation reports providesignificant visibility into the architecture operation and allow forgreat optimization of the system throughput.
A number of other approaches can also be used for architecture exploration.They are extremely hard to generate.Thisincludes hand-annotating specific sections of the code; generating abus trace with a list of instructions, and tapping the Operating Systemfor cache accesses.These approaches are implementation-specific but can be targeted for a timing-intensive function.So,the next time you are doing architecture exploration, look at youroptions for the software emulation to test the architecture. Look beyondthe ISS.Look at the instruction-mix table.
Deepak Shankar is the Founder of Mirabilis Design, a systems engineering software solutions provider. He has been involved with architecture exploration of embeddedsystems, semiconductors and real-time software for over 20 years.Whileat Mirabilis Design, he has developed new methodologies and solutionsto streamline the validation of system specification, make architectureexploration extremely accurate and accelerate the systems engineeringprocess.Prior to Mirabilis Design, Deepak Shankar hasworked at Cadence, Spincircuit and Memcall in technical, marketing andexecutive management roles.Mr. Shankar has published over30 articles in technical journals around the world and has been thelead speaker at various IEEE and other Organizations.He has a MS in Electronics from Clemson University, MBA fromUniversity of California Berkeley and a BS in Electronics andCommunication from Coimbatore Institute of Technology.
This article was previously published on the EETimes SoC DesignLine Page.