Implementing a decoupled FPU for ARM-based Cortex-M1 SoCs in FPGAs

Jaume Joven and Jordi Carrabina, CEPHIS-MISE; Per Strid and Akash Bagdia, ARM Ltd.; and Giovanni De Micheli, LSI-EPFL,Lausanne, Switzerland

December 30, 2012

Jaume Joven and Jordi Carrabina, CEPHIS-MISE; Per Strid and Akash Bagdia, ARM Ltd.; and Giovanni De Micheli, LSI-EPFL,Lausanne, SwitzerlandDecember 30, 2012

Nowadays industrial monoprocessor and multiprocessor systems make use of hardware floating-point units (FPUs) to provide software acceleration and better precision due to the necessity to compute complex software applications.

Industrial applications have to execute a great variety of kernels which often use occasionally or intensively floating- point arithmetic rather than non-accurate fixed-point arithmetic.

Examples of these applications range from digital multimedia (e.g. audio and video) and signal processing (e.g. DCT/iDCT, FFT) tasks, 3D gaming for graphics processing, software-defined radio, wireless communication, to control or computation intensive applications on embedded automotive real-time systems. As a consequence of the application re- quirements, most of them integrate floating-point (FP) support in hardware to accelerate applications.

This paper presents the design of an IEEE-754 compliant FPU, targeted to be used with ARM Cortex-M1 processor on FPGA SoCs. We face the design of an AMBA-based decoupled FPU in order to avoid changing of the Cortex-M1 ARMv6-M architecture and the ARM compiler, but as well to eventually share it among different processors in our Cortex-M1 MPSoC design.

Our HW- SW implementation can be easily integrated to enable hardware- assisted floating-point operations transparently from the software application.

In this work the first objective is to design of a IEEE- 754 FPU which can be integrated effortlessly in any FPGA device together with the Cortex-M1 soft-core processor in order to accelerate FP operation whenever it is required.

At the same time, the second target is to face a decoupled AMBA- based design of the FPU without changing the ARMv6-M architecture and the compiler. This particular design opens the possibility to share a single FPU among different processors in low cost Cortex-M1 MPSoCs in FPGAs.

As a consequence, a part from the hardware design, in this paper we also focus on the exploration of different alternatives to integrate the decoupled Cortex-M1 FPU by taking into account the CPU-FPU communication protocol, as well as the software requirements. The main contributions of this paper are:

1) the design of an experimental AMBA-based decoupled hardware FPU for Cortex-M1 FPGA systems.
2) the design of 8-core Cortex-M1 MPSoC for FPGA platforms.
3) the exploration and synchronization methods between the hardware and the software (i.e. CPU-FPU commu- nication protocol).
4) the study of the viability to share several FPUs among different processors by means of floating-point transac- tions on AMBA fabrics.
5) FPU evaluation, in terms of area and performance, in a real Cortex-M1 FPGA system.

This work reports synthesis results of our Cortex-M1 SoC architecture, as well as our FPU in Altera and Xilinx FPGAs, which exhibit competitive numbers compared to the equivalent Xilinx FPU IP core.

Single and double precision tests have been performed under different scenarios showing best case speedups between 8.8x and 53.2x depending on the FP operation when are compared to FP software emulation libraries.

To read this external content in full, download the complete paper from the author's archives online at the Swiss Federal Institute of Technology in Lausanne, Switzerland.

Loading comments...