# Designing optimal wireless base station MIMO antennae: Part 2 – A maximum likelihood receiver

In MIMO antenna design, the maximum likelihood (ML) receiver has significant advantages, but these come at the price of implementation complexity.

The maximum likelihood (ML) receiver estimator solves the following equation:

For the sake of simplicity, let’s use a SISO single transmit and receive antenna configuration as an example. In this case, * y * is the signal sampled at the receiver,

*is the transmitted symbol, and*

**s***is the channel impulse response describing the channel between the transmit antenna and receive antenna.*

**H**The receiver looks for the transmitted symbol s, which minimizes this absolute value:

in which *s* belongs to a group of finite values that are defined by the symbol modulation. For 64QAM modulation, for example, * s * can have 64 different values.

Basically, this boils down to an exhaustive search. The receiver must scan all possible values of s to find the one that when multiplied by the estimated channel H will be closest to the received signal.

For a SISO system this is quite simple, but when moving to a MIMO system the complexity grows exponentially. For example, in a 2X2 MIMO configuration with 64QAM modulation, s is a vector of two values. The first antenna can transmit 64 different symbols and the second antenna can also independently transmit one of 64 possible symbols. There are a total of 642 or 4096 values of s that must be evaluated.

For 2X2 MIMO, a number of algorithms are used to reduce complexity of the ML receiver. Worth noting is the LORD algorithm, which is capable of reducing the search complexity from 642 options to 64*2 or 128 evaluations reaching ML precision.

For 4X4 MIMO 64QAM this number now grows to 644, or 16,777,216 different values of s that must be evaluated. Solving this magnitude of complexity requires a new approach; this is where the suboptimal ML receivers come in to play.

**Playing with suboptimal ML receivers**

Suboptimal ML receivers try to scan the possible transmitted signals in a more efficient way, thereby reducing the overall complexity and reaching near-ML precision results. The reduced complexity contributes to a more practical hardware implementation in terms of area and power. This also enables the hardware to keep up with the high throughputs defined by advanced communications standards.

Solving a suboptimal ML equation may be defined as a tree search (**Figure 7** ) in which each level of the tree corresponds to a transmitted symbol. The number of the branches protruding from each node matches the QAM or modulation of the transmitted symbol. A 4X4 MIMO configuration is represented by a four-level tree. If the modulation is BPSK, each node will contain two branches.

Once the tree symbol is defined, tree traversal algorithms may be deployed, borrowing from other fields such as computer science.

**Figure 7: MIMO symbol tree**In this context, suboptimal ML receivers can be partitioned into two main types: breadth first search, and depth first search

**Breadth first search.** An example of breadth first is the K-best algorithm. This decoder is a fixed-complexity solution that starts from the tree root and ascends until it reaches the last level of the tree.

At each level of the tree, all selected branches are evaluated and K survival nodes are preserved, matching the best solution (representing the symbols closest to the received signals), hence the name ‘K-best’. The K remaining leaves are then used to generate the LLR results.

Advantages of this decoder are:

- Unidirectional flow contributes to the easy pipelining implementation in hardware.
- Processing power required to calculate each level is constant, and directly related to the number of survival nodes (K) selected in the implementation.
- Throughput is constant, which in turns simplifies data flow scheduling in the system.

Disadvantages of this decoder include:

- Large area implementation is required in order to evaluate and sort all the selected nodes of the level.
- The larger the precision requirements, the higher the K value required.
- Throughput does not increase in optimal SNR conditions.
- Reaching the ML solution is not guaranteed, because the best solution might reside in the nodes that are not selected.

**Figure 8** shows a MIMO 4X4 (4-level) tree with QPSK modulation. K in this case is four. Sixteen nodes will be sorted at each level of the tree. The best four will be the surviving nodes for the next level.

**Figure ?8: K-best tree traversal****Depth first search.** An example of depth first is the Soft-Output Sphere Decoder algorithm. This decoder is an adaptive complexity solution that starts from the tree root and primarily ascends directly to a tree leaf, hence the name ‘Depth First.’

This first solution of the tree determines an initial search radius or sphere. From then on, the decoder backtracks and ascends throughout the levels of the tree. Each node of the tree that exceeds the search radius is pruned together with all the nodes underneath it. Each time a better solution is found, the radius is reduced accordingly. In this way, the symbol tree is scanned and pruned until the number of valid options is reduced. The remaining symbols represent the ML solution.

Advantages of this decoder are:

- Obtaining the ML solution is guaranteed, contributing to the precision of the result.
- Under high SNR conditions the decoder performs faster, increasing throughput and reducing power consumption.
- Smaller area implementation compared to equivalent breadth first solution.

**Figure ?9 ** shows a cycle count comparison between a Soft-Output Sphere decoder with adaptive complexity compared to a fixed complexity K-best decoder. As the SNR increases, the sphere decoder will reduce its cycle count while the fixed complexity will stay constant, regardless of the channel conditions.

**Figure ?9: Fixed vs. adaptive complexity**Disadvantages of this decoder include:

- Non-deterministic behavior of the decoder complicates system scheduling.
- Next branch selection is known only after the current branch is complete. This makes the hardware pipeline implementation challenging.

**Figure ?10** shows an example MIMO 4X4 (4 level) tree with QPSK modulation in which the following occurs:

- Depth first chooses the symbol path to the first leaf in the following manner: a. -3 (level 1), b. -3 (level 2), c. 1 (level 3), and d. 3 (level 4)
- Initial Radius is updated.
- Backtrack is performed to a symbol at level 2.
- Branches that exceed the search radius (shown in red) are pruned during the search, thereby minimizing the search tree.

**Figure ?10: Sphere decoder tree traversal** **A maximum likelihood (MLD) solution**

One approach to thechallenges of the MIMO receiver can be found with the Maximum LikelihoodMIMO Detector (MLD) from CEVA. The MLD is a tightly coupled extension(TCE) accelerator hardware unit. The MLD is capable of processing LTE –Advanced Cat.7 data streams and produces soft-output max-log MLsolutions.

The MLD accelerator reaches suboptimal maximumlikelihood (ML) solution for 4X4 or 3X3 MIMO @ 12.6 mega-tones/sec usinga soft-output sphere decoder approach and 2X2 LORD-based ML solution @28.8 mega-tones/sec using carrier aggregation. The MLD is designed formobile applications, emphasizing a low-power design concept. The MLDfeature set includes support for:

- Variable transmission schemes from 2X2 up to 4X4 MIMO, with configurable modulation per layer of up to 64QAM.
- Tree search optimization: user-defined layer ordering, initial radius and search radius for each tree level.
- CEVA MLD addresses the non-deterministic nature of the soft-output sphere decoder by presenting throughput control capabilities, including lower and upper cycle count boundaries for tone processing. In addition, the system throughput is maintained using user-defined timestamp-based termination.
- Soft-bits can be scaled to account for SNR and modulation factors.
- Support is provided for LLR permutations at intra-symbol and inter-layer resolutions.
- Internal layer de-mapping: two code layers are supported, enabling the MLD to split the written data to two different destinations.
- Scalable hardware solution enables performance/power/area trade-offs, including choosing the number of MLD engines, buffers sizes and interface clock ratio.

In addition, the accelerator provides extensive debug and profiling capabilities.

**Figure 11** shows a block diagram of the MLD accelerator which consists of an AXIinterface, input buffer, dispatcher, MLE (maximum likelihood engine),LLR generator, reorder buffer and output buffer.

**Figure 11: MLD accelerator block diagram**Theinput buffer stores a multitude of tone data that is transferred onetone at a time via the dispatcher to the MLEs. Each MLE outputs dataregarding the detected bits; this in turn is transformed into LLR formatby the LLR generator. The reorder buffer accumulates the LLR data inorder of transmission and sends the organized output toward the outputbuffer. The output buffer writes the LLRs to the next block in thereceive chain via the AXI interface.

How MLD performs. Figure12 illustrates CEVA MLD TCE performance compared to an MMSE receiverusing 4X4 spatial multiplexing MIMO. Throughput in PER (packet errorrate) is evaluated at different SNR conditions.

**Figure 12: 4X4 MIMO spatial multiplexing performance**TheLTE channel is set at EPA 5Hz with low correlation propagationconditions, which obtains near-ML results, while MMSE suffers fromsevere performance degradation even in low correlation conditions. Forhigher correlation conditions, the MMSE will worsen even further.

Forcomparison, a K-best solution with similar performance will requiremore than twice the area of the CEVA MLD TCE. To do this the CEVAimplementation incorporates the following capabilities:

- Precision of less than 1.5dB loss for MIMO 4×4 compared to pure ML decoding.
- Decoding MIMO 2×2 with no precision loss (LORD equivalent performance and complexity)
- Ultra-low-power design
- Competitive die size

**Figure 13** shows the performance of 4X4 MIMO with SM at peak code rate with 64-QAMmodulation. Even in these conditions, the CEVA MLD TCE provides lessthan 1.5dB loss compared to ideal ML results.

**Figure 13: MLD 4X4 MIMO performance****Figure 14 ** illustratesthe performance of 2X2 MIMO with SM at peak code rate with 64-QAMmodulation. The CEVA MLD TCE provides perfect ML performance.

**Figure 14: MLD 2X2 MIMO performance****Conclusions**

MIMOis a key component of next-generation wireless technology; in order tofully exploit the potential data rate, it is essential to deploy spatialmultiplexing techniques. Choosing an optimized MLD receiver can be themain differentiator in a cellular product.

While this article hasshown that the MLD receiver achieves superior results to the linearreceiver, there are many factors that need to be considered whenchoosing an MLD implementation, including:

- Precision targets and throughput requirements: demands a user-configurable solution in order to obtain high-quality LLRs quickly.
- Latency definitions: calls for definable system scheduling in order to complete the task in the allotted time – for example, by using timestamps.
- Channel type fast/slow time variant: a fast time variant channel will require the ability to frequently update channel information
- Hardware considerations: area, speed (MHz) and power dissipation.
- Requires a scalable hardware solution to meet small area and low power requirements

Read **Part 1: Sorting out the confusion **

**Noam Dvoretzki ** isa senior DSP processor architect at CEVA. His 12 years of experienceinclude architectures of the CEVA-XC DSP and hardware accelerators for4G wireless communication standards, in addition he has designed DSPs infield of computer vision and HD Audio. He has extensive experience inRTL and backend design and holds BSc. in Computer and ElectricalEngineering from Ben-Gurion University.

**Zeev Kaplan** is asenior Communication Algorithms Engineer in the Architecture Departmentin CEVA . He has over 12 years of experience in communicationsengineering with expertise in algorithms and systems design. He hasextensive experience in wireless: LTE, Wi-Fi and wired: home-networking(HomePNA, HomePlug, G.hn) networking standards. Zeev Kaplan has a BSc.and MSc. in Electrical Engineering from the Technion – Israel Instituteof Technology.

**References:**

P.W. Wolniansky, G.J.Foschini, G.D. Golden and R.A. Valenzuela, “V-blast: an architecture forrealizing very high data rates over the rich-scattering wirelesschannel”, Signals, Systems, and Electronics, URSI InternationalSymposium on, vol. 1998, pp. 295-300, 1998.

B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple-antenna channel”.

Y.Lomnitz and D. Andelman, “Ef?cient maximum likelihood detector for MIMOsystems with small number of streams”, Electronics Letters, vol. 43,no. 22, pp. 1212–1214, 2007.

M. Siti, and M.P. Fitz, “A NovelSoft-Output Layered Orthogonal Lattice Detector for Multiple AntennaCommunications”, IEEE International Conference Communications, 2006. ICC’06.