If you have been following my ramblings on the Zynq over the years you will notice that the last few weeks I have focused upon a new tool from Xilinx called SDSoC. This tool has been developed to open up SoC and MPSoC capabilities for the first time to embedded system designers in a familiar environment. What this really means is we can develop our embedded application in an eclipse IDE using C, C++ or System C.
To me this is important as the Zynq is a device every embedded system designer should be familiar with and considering for their application. At its heart the Zynq is not a FPGA with embedded processors — like previous generations of FPGA with Power PCs — but a true embedded processor with very flexible interfacing capabilities (DDR, CAN, UART, USB, Giga Bit Ethernet, SPI and I2C to name a few). What separates the Zynq from other embedded processors is the attached programmable logic, and with SDSoC embedded system developers can exploit this pretty simply.
Those familiar with FPGA development may have noticed over the recent years the trend towards high level synthesis (HLS) and have experimented or developed with tools like Vivado HLS. HLS tools allow us to develop algorithms in C, C++ or System C and generate a synthesizable RTL netlist. Obviously this saves significant time in the development life cycle, as it is much faster to generate and verify algorithms in C than HDL.
SDSoC takes the eclipse front end, Vivado HLS, Vivado and a lot of behind the scenes intelligence to create seamlessly the option to accelerate software functions in the attached programmable logic of the device.
Accelerating the function mmult() into the hardware at the click of a button
So how does it do this? To the user it is really as simple as selecting the function to be accelerated (see above image) and clicking build, of course what happens behind the scenes is much more complex and interesting.
At the heart of the SDSoC tool is the “connectivity framework,” which describes the logical and physical connections between the PL (programmable logic) and PS (processor system) sides of the device. The logical and physical connections are provided to SDSoC in the form of a hardware definition and a software definition and is essentially the board support package for the embedded system containing:
- Clocks – All clocks used within the SDSoC platform must be from the processor clock
- Resets – The number of resets available
- Interrupts – The number of interrupts available
- AXI – The number of AXI and AXI-Streaming connections available
- Boot Files depending upon the Operating System
- Library packages
- Prebuilt hardware definitions – this saves on compile time.
This board support package allows SDSoC to understand what resources are available to use for acceleration. This means that when we click build behind the scenes SDSoC will:
- Employ Vivado HLS to generate logic for the PL side of the Zynq SoC
- Analyze communications to and from the function being accelerated
- Establish an AXI communication network based upon the above analysis and build the hardware.
- Generate a software stub function for the function being accelerated
Unsurprisingly SDSoC includes API’s to allow transfers using the connectivity framework. It is the generated software stub function that is actually called by the linker in place of the original function call.
While the software interfaces to the stub function remain identical to the original, its functionality is quite different. The stub function uses the connectivity framework to initialize and send/receive data to and from the PL hardware where the accelerated function now resides.
How the connectivity framework works is really exciting: It uses implementation-independent software API calls to synchronize data transfers to and from the PL. When the code is built, these calls are then translated to the correct drivers based upon the configuration of the AXI network created.
When I tried this using one of the examples provided with SDSoC, I managed to reduce the execution time taken by over 70% when I moved a matrix multiply from executing in software to being accelerated within the PL of the device.
I am not going to lie and say the build time was as quick as a traditional SW compilation. I did have time to get a cup of tea or two while it was building. However, the time invested for the result was very significant as I could not have written and verified a RTL modules to do the same in the same timescale.
Elapsed time when the function (hardware version) is executed in Software
Elapsed time when the function (hardware version) is accelerated in the PL
Overall I am really impressed with SDSoC and can see it will have a big impact on embedded system designs, you can read more about my journey with the Zynq here