Choosing the right memory for high performance FPGA platforms
High performance computing is critical for many applications and developers can often find solutions for their own embedded systems design problems in some of the most competitive of these applications. For example, high-frequency trading (HFT) is a form of algorithmic trading that accounts for the majority of US equity trading volume. High-frequency trading involves using machine-learning algorithms to process market data, implement strategy, and execute orders within microseconds. High-frequency traders move in and out of short-term positions at high volumes aiming to capture sometimes a fraction of a cent in profit on every trade. Systems using HFT algorithms constantly monitor price fluctuations for short-term trading strategies. Because it’s a very short-term trading strategy, HFT firms do not consume significant amounts of capital, accumulate positions or hold their portfolios overnight. Today, high-frequency trading accounts for more than 75% of US equity trading volumes.
At the turn of the 21st century HFT trading was focused on superior algorithms and trading strategies. So the advantage lay on strategy rather than speed with the most popular systems having latency of the order of seconds. By 2010, algorithmic improvements were not sufficient to gain trade advantages, and participants started reducing tick-to-trade latency to gain an advantage over each other. This brought trade time down to microseconds.
Stimulated by sub-millisecond buy and sell trade orders, HFT platforms are engaging in a highly competitive speed race to cut down market data round-trip latency into the microsecond order. Since a difference of even a few nanoseconds can create a big competitive advantage in the form of latency arbitrage (referred to as ‘front running’), trading firms are constantly on the lookout for faster trading servers.
Traditionally, software tools have been used to perform HFT trading. These tools make use of high-performance computing systems that are efficient in performing complex trading strategies (Figure 1). The OS kernels on these systems control access to CPU and memory resources while the application stack handles all trading strategies. A Network Interface Card (NIC) is used to interface the system to the stock exchange.
Figure 1. Order processing in a software based approach (Source: Cypress)
However, this configuration suffers from drawbacks with respect to tick-to-trade latency:
- Standard NICs are not optimized to handle TCP/IP and proprietary trade exchange protocols, and cannot handle market feeds onboard.
- There’s an added delay of a few microseconds on the PCI Express bus between the host system and Ethernet cards.
- The interrupt-based approach of the kernel OS inherently causes long delays.
- These solutions are based on multi-core processors sharing memory resources. Shared memory access is not best suited for deterministic latency which is critical when handling feeds from a stock exchange.
Recent advances in algorithmic trading have introduced some lower-latency solutions, the most promising of which is custom hardware built using Field Programmable Gate Arrays (FPGAs). These devices are a bridge between the extreme performance of hard-coded ASICs and the flexibility of CPUs. FPGAs provide a vast array of concurrent resources that can be configured to drastically reduce round trip trade latency compared to software based solutions (Figure 2).
Figure 2. Order processing in an FPGA based approach (Source: Cypress)
Besides being flexible, FPGAs can be programmed to be self-sufficient in processing critical tasks like data acquisition, risk matching and order processing. These self-sufficiencies make them faster and more reliable than software algorithms. The key factors that allows FPGA-based solutions to offer such massive improvements in performance in electronic trading is that they enable processes traditionally handled by software to run directly on FPGA.
These advantages that FPGAs hold over software-based algorithms are due to the following functions being offloaded to the FPGA itself:
- Handling of the TCP/IP message
- Decoding FAST or similar exchange specific protocols and stripping relevant data
- Making trading decisions without incurring any Kernel based interrupt delay
- Mitigating risk by managing order books and trade logging within FPGA
Due to these differences, FPGA-based solutions provide ultra-low latency feed handling as well as ever-faster order execution and risk assessment. They also attain maximum performance per watt to minimize energy and thermal requirements. Another advantage of FPGA solutions is the ability to scale to deploy “FPGA Farm” implementations.