Building a Linux-based femtocell base-station using software performance engineering: Part 1
Femtocells/small cells are considered to be key to next-generation wireless operator networks, as they providing coverage, they offer capacity and cost advantages compared to large cell deployments. Using existing IP backhaul infrastructure (e.g. DSL/FTTH) and Self Organizing Networks (SON), deployment is simple and low-cost.
However, in order to achieve the system price point associated with wide consumer deployment, system cost needs to be orders of magnitude below that of a traditional macro cell solution. Also, power supply restrictions leave little room for ‘over-design’ on either the hardware or software side.
This two-part article describes one way to achieve this goal. In this first part we describe an efficient low-cost femtocell design in which a Linux-based fast-path software architecture is implemented on base-station-on-a-chip hardware containing all the necessary digital processing, from Ethernet interfacing all the way to A/D converter, including the control plane, packet processing, and Layer-1 signal processing (Figure 1).
In Part 2, we will describe how to use the principles of software performance engineering http://embedded.com/design/real-time-and-performance/4395691/Software-performance-engineering-for-embedded-systems--Part-1---What-is-SPE- to integrate hardware and software elements and to evaluate whether or not the resulting implementation meets the design goals.
The particular base-station-on-a-chip hardware chosen is based on Freescale’s BSC913x family (Figure 2), which targets a variety of use cases, for example 100Mbps DL, 50Mbps UL operation with 16 active UE operation. On the Layer-1 side, such performance is achieved by using a mix of a StarCore SC3850 high-performance DSP core and the MAPLE hardware acceleration for a.o. the physical downlink shared channel (PDSCH) and physical uplink processing element, which performs decoding of physical uplink shared channel (PUSCH) resulting in decoded information bits.
The remainder software stack (L2, L3, OAM, transport components) runs on the Power Architecture/e500 core with associated hardware acceleration for IPSec, 3GPP ciphering, timing, etc.
Achieving the challenging system throughputs on a single-chip solution leaves little room for inefficiencies on the software architecture and implementation. As such, close cooperation between software and hardware development teams is crucial during architecture and implementation phases. The work presented focuses on the challenges imposed on the GPP Power Architecture e500 processor software architecture and the optimum solution to such challenges as reached by close cooperation between 3rd party software developers and Freescale.
Using Linux as a fast-path OS architecture
In order to achieve portability, debugability, and code re-use targets, Linux is an obvious choice for the OS for the small-cell platform. However, as well known in the industry, Linux is not capable of achieving the 1 mSec hard real-time deadlines required for LTE (long term evolution) applications. Two industry approaches exist to enhance Linux to achieve real-time deadlines:
Real-Time Linux (Figure 3) approaches such as Real-Time Linux and the Xenomai development framewor. Such approaches create an isolated real-time environment in parallel to Linux by trapping interrupts. This means that applications need to be ported to the thin kernel that is provided by the real-time portion of Linux. Besides this drawback, debugability can be an issue (the standard user space Linux toolset is not available).
Approaches that enhance the Linux kernel to make it fully pre-emptible and real-time capable (Figure 4). The PREEMPT_real-time patches by Ingo Molnar convert Linux into a fully pre-emptible kernel with predictable response time without loss of debugability and with limited performance impact.
We chose the PREEMPT_real-time approach for the base station application. Performance was benchmarked (Figure 5) to be reliably below 50μS using the cyclic test benchmarks.
Note that even though the PREEMPT_real-time patches minimize the latency to be well within the 1 mS boundary imposed by the LTE standard, the performance is still an order of magnitude worse than that of a true real-time OS.
Given this worst-case latency to be assumed with task switching, the user-space, data path portion of the L2 application is separated into a minimum number of threads that allow partitioning between hard and soft real-time. Scheduling of tasks within a thread is done by the application. As shown in Figure 6, this allows two main threads defined as:
- Hard real-time - Scheduler, MAC, and RLC components, with deadline-driven execution times.
- Optionally, the uplink MAC/RLC components can be executed in a separate thread that doesn’t have a strict deadline
- Soft real-time – PDCP, GTP, UDP components, with throughput/performance requirements but no execution deadline.
Note that system calls that are executed from the fast path code effectively translate to ‘Linux tasks’ that are scheduled by the kernel. Performance requirements dictate that such task switch overhead cannot be accepted. As a result, all drivers called from the user-space application (e.g. L2/L1 interface, PDCP ciphering) are implemented as user-space-only drivers using the UIO framework http://lwn.net/Articles/232575/. Also, as part of the performance optimization efforts, care is taken to remove all but the necessary system calls from the remainder application components.
Note that the overall goal of minimizing system calls is the main driver for performance-related design decisions throughout the software architecture process.