Performance modeling plays a critical role in the design, evaluation, and development of computing architecture of any segment, ranging from embedded to high performance processors.
Simulation has historically been the primary vehicle to carry out performance modeling, since it allows for easily creating and testing new designs several months before a physical prototype exists. Performance modeling and analysis are now integral to the design ﬂow of mod- ern computing systems, as it provides many signiﬁcant advantages:
i) accelerates time-to-market, by allowing the development of software before the actual hardware exists;
ii) reduces development costs and risks, by allowing for testing new technology earlier in the design process; iii) allows for exhaustive design space exploration, by evaluating hundreds of simultaneous simulations in parallel.
High-end embedded processor vendors have deﬁnitely embraced the heterogeneous architecture template for their designs as it represents the most ﬂexible and efﬁcient design paradigm in the embedded computing domain.
Parallel architecture and heterogeneity clearly provide a wider power/performance scaling, combining high performance and power efﬁcient general-purpose cores along with massively parallel many-core-based accelerators. Examples and results of this evolution are AMD Fusion, NVidia Tegra and Qualcomm Snapdragon. Besides the complex hardware, generally these kinds of platforms host also an advanced software eco-system, composed by an operating system, several communication protocol stacks, and various com- putational demanding user applications.
Unfortunately, as processor architectures get more heterogeneous and complex, it becomes more and more difﬁcult to develop simulators that are both fast and accurate. Cycle-accurate simulation tools can reach an accuracy error below 1-2%, but they typically run at a few millions of instruction
We present in this paper VirtualSoC, a new virtual platform prototyping framework based on QEMU and SystemC targeting the full-system simulation of massively parallel heterogeneous system-on-chip composed by a general purpose processor (i.e. intended as platform coordinator and in charge of running an operating system) and a many-core hardware accelerator (i.e. used to speed-up the execution of computing intensive applications or parts of them).
The architecture targeted by this work is representative of the above mentioned platforms and composed by a many-core accelerator and an ARM-based processor which is emulated by QEMU which models an ARM926 processor, featuring an ARMv5 ISA, and interfaced with a group of peripherals needed to run a full-ﬂedged operating system (ARM Versatile Express baseboard).
The many-core accelerator is a SystemC cycle-accurate MPSoC simulator. The ARM processor and the accelerator share the main memory, used as communication medium between the two. The accelerator target architecture features a conﬁg- urable number of simple RISC cores, with private or shared I-cache architecture, all sharing a Tightly Coupled Data Memory (TCDM) accessible via a local interconnection
VirtualSoC exploits the speed and ﬂexibility of QEMU, allowing the execution of a full-ﬂedged Linux operating system, and the accuracy of a SystemC model for many-core-based accelerators. The speciﬁc features of VirtualSoC are:
1) Since it exploits QEMU for the host processor em- ulation, unmodiﬁed operating systems can be booted on VirtualSoC and the execution of unmodiﬁed ARM binaries of applications and existing libraries can be simulated on VirtualSoC.
2) VirtualSoC enables accurate manycore-based accelerator simulation. We designed a full software stack allowing the programmer to exploit the hardware accel- erator model implemented in SystemC, from within a user-space application running on top of QEMU. This software stack comprise a Linux device driver and a user-level programming API.
3) The host processor (emulated by QEMU) and the SystemC accelerator model can run in an asynchronous way, where a non-blocking communication interface has been implemented enabling parallel execution be- tween QEMU and SystemC environments.
4) Beside the interface between QEMU and the SystemCmodel, we also implemented a synchronization protocol able to provide a good approximation of the global system time.
5) VirtualSoC can be also used in stand-alone mode,where only the hardware accelerator is simulated, thus enabling accurate design space explorations.
To read more of this external content, download the complete paper from the author archives at National Sun Yat-sen University, Taiwan.