Designing an ARM-based Cloud RAN cellular/wireless base station
Cellular service providers are looking for cost-effective, scalable ways to manage their networks profitably. Cloud radio access network (Cloud RAN) technology is gaining traction with service providers as an efficient means of processing wireless network signals by virtualizing baseband processing onto large server farms and ultimately reducing costs.
This article describes a novel architecture for baseband processing using ARM’s Cortex A57 processors for use in mobile wireless base stations in conjunction with our modem processing unit (MPU), a real-time, reconfigurable platform that allows for the implementation of a wide variety of communication standards in Cloud RAN as a tightly integrated co-processor to general-purpose computers. This approach reduces power consumption, increases overall network throughput, and decreases CAPEX and OPEX by offloading tasks from older base stations that are expensive to operate.
Radio access networks
Conventional base stations are the core of wireless network RANs. They include unified RF units and baseband processing units positioned at the base station site. From an operator’s point of view, this approach has significant limitations . Each base station connects to a fixed number of sector antennas outfitted for peak voice and data demand at their coverage region. With this approach, it is nearly impossible to improve system capacity since interference mitigation techniques are difficult to employ. Furthermore, the base stations are built on proprietary platforms, and as such, they are expensive to construct, maintain and operate.
Cloud RAN is a compelling alternative approach. It is composed of three main parts: distributed radio units, antennas, and remote radio unit (RRUs) located at the remote site, a base band unit (BBU) pool comprised of high-performance general purpose processors (GPP) located in a data center, and a high-bandwidth, low-latency optical transport network that connects the RRUs and the BBU pool. The Cloud RAN approach not only reduces the construction costs (assuming fiber optical backhaul already exists) and operation costs of each base station facility, it also allows for dynamic reallocation of virtualized processing resources from one base station to another when utilization shifts throughout the day and week.
A main technical challenge of Cloud RAN is the BBU pool implementation. According to a recent study , a centralized BBU pool of a medium-sized dense urban network (25 km2) should support about 100 base stations (300 sectors) while each BBU should meet the high throughput and low latency requirements of a modern wireless communication standard with the goal of executing the all software layers on GPPs. In addition, the BBU pool should be highly power-efficient in order to show a real decrease of power consumption compared to conventional systems that use efficient base stations.
Successful attempts to develop full base stations on a pure GPP platform are reported in recent studies [1, 2, 4]. However, these studies show that in spite of using state-of-the-art platforms and innovative techniques, such platforms are not as optimized as dedicated SoC platforms when it comes to executing intensive physical layer tasks, such as Turbo decoding, FFT, and large-scale MIMO decoding.
This gap can be mitigated by offloading the intensive processing tasks of the physical layer from the GPP to an optimized co-processor, provided the co-processor is an open platform and provides multi-standard support, a programmable radio, and other characteristics required for Cloud RAN.
Physical layer background and requirements
The LTE physical layer component is discussed here to explain the characteristics of wireless PHY components and the challenges of implementing them on GP CPUs.
A typical processing chain of the LTE physical uplink shared channel (PUSCH) and physical downlink shared channel (PDSCH) is shown in Figure 2. In the UL, complex samples coming from the RRU at a rate of up to 30.72 Ms/sec per Rx antenna are fed to 2048 points FFT blocks (FFT block per Rx antenna) and then proceed through the UL processing chain, yielding throughput of up to 100 Mbps/sector, assuming that 2 MU-MIMO layers are used. In the DL, bits at a rate of up to 150 Mbps (per carrier) are encoded through the DL chain, modulated, pre-coded, and fed to the IFFT module per Tx antenna, producing up to 30.72 Ms per antenna.
The baseband processor must also process several control channels that are mapped together with the DL and UL data channels. Studies show that the IFFT, the turbo encoding and the MIMO pre-coding blocks are the most demanding tasks in the DL processing chain, especially when the system includes 8 Tx antennas [2, 3]. In the UL processing chain, the turbo decoder is the most demanding block, followed by the FFT, channel estimation (CE), and the MIMO equalizer.
In addition to its high throughput capabilities, LTE has a stringent delay budget as depicted in Figure 3. The physical layer HARQ protocol places the highest demand on processing delay. In the downlink, the baseband processor must decode the HARQ feedback coming from the UE (UL ACK/NACK information), then, based on the decoding result, it must decide whether to schedule new data or retransmit the previous data, and finally it must encode and transmit the data on the optic interface in less than 3ms in order to maintain successive transmission.
These 3ms include the two-way propagation and transporting delay of the optic interface between the BBU and the RRH, which can take up to 400us. In the uplink, the base band processor must decode the PUSCH and encode accordingly the HARQ feedback in less than 3ms in the worst case scenario. Overall, each processing chain (DL or UL) must be complete in less than 2.6ms.
Modem processing unit - co-processor to a GP CPU
The MPU is a heterogeneous, multi-core signal processing platform designed for use as a co-processer to CPUs in a Cloud RAN BBU. Just as a graphic processor unit (GPU) accelerates the graphics operations in a PC or a workstation, an MPU accelerates complex physical layer tasks common to most communication systems. It supports a large range of system partitioning solutions, from a simple accelerator to receiving and transmitting chains. The MPU is controlled through a standard API implemented in C called a modem programming language (MPL). The MPL interface de-couples the internal operation of the MPU from the L1 control layer, giving the designer a powerful, flexible tool to implement various algorithms and the ability to support various air interface technologies and standards.
Figure 4 shows the MPU architecture, which is connected to GP CPUs through a high-speed interface for transferring data and control (PCIe for example). It is comprised of processing elements (PEs) to process key communication tasks such as, FFT/DFT, turbo and Viterbi decoding, complex arithmetic operations (required for large scale MIMO decoding), interleaving/de-interleaving, address and code generators and more. Each PE contains a light RISC processor called a standard sequencer (SSQ) that controls the PE’s execution. The PE’s SSQ is in charge of buffer allocation, configuration of parameters, handshakes with other PEs, and more.
Data is transferred between PEs through buffers located in the memory bank. Control messages are transferred between PEs through a dedicated control interface that can connect PEs to other PEs.