Design and implementation of an ARMv4 tightly coupled multicore in VHDL

Carlos Ariño Alegre, Berlin Technical University

January 10, 2013

Carlos Ariño Alegre, Berlin Technical UniversityJanuary 10, 2013

One way to make use of the vast logic resources found in modern FPGAs is to implement multicore architectures based on soft IPs of general purpose CPUs. The biggest advantage of this solution is the availability of massively parallel computing power, which can be programmed in a straightforward way by using existing software source code.

Current FPGAs posses enough logic resources for several instances of suitable embedded CPUs. Beside the possibility to use them independently, there are difierent approaches to use them together (e.g. processor array, multicore systems).

Of course these approaches differ in many ways (e.g. hardware complexity, programming model). When using a multitude of communicating CPU cores in a single system, the problem of interconnecting the cores in a feasible manner needs to be solved. Common architectures for this problem include bus topologies with a single bus instance, partitioned busses and network on chip topologies.

Almost all architectures share a common drawback: Core-to-Core communication usually features latency to a degree, which makes it inappropriate for tight CPU coupling and distributed computing problems with data and control dependencies only a couple of clock cycles apart.

In our work, we are looking for a lightweight implementation that enables explicit and direct use of multiple core instances. Data should be movable from datapath of one instance straight to another, therefore the different cores has to be coupled very tight. The possibility of using standard compiler to prepare C programs for this architecture is essential to use this multicore infrastructure.

The different cores are coupled very tightly with a fast communication between them.With our design we achieve the two important objectives. The first one is that we designed a concept of a generic infrastructure to couple these cores with minimum impact to existing control and data path.

In the second we implemented a synthesizable solution without changing the instruction set architecture. In this way we accomplish the mentioned lightweight implementation. This design allows us to make less expensive processors with a larger number of cores to achieve the same or even a better performance than making single core processors.

To read this external content in full, download the complete paper from the author archives on line. 

Loading comments...