Advanced voice enhancement devices are more critical than ever. The key challenge in VED development is to use the extremely limited resources in terms of DSP processing load and memory usage. Here is an in-depth look at the technical challenges including network echo cancellation, acoustic echo control, noise reduction, and automatic level control.
By Perry Peiyuan He and Roman Anthony Dyba, Freescale Semiconductor Inc.
Market trends in telecommunication have created a challenging environment for providing Quality of Service in voice and data transmission. As Public Switched Telephone Networks (PSTN) converge with packet, cellular and enterprise networks, the challenges of adequate handling of echo, noise, transmission delay, algorithm delay, processing delay, and packet loss continues to demand increased technical attention.
Voice Enhancement Devices (VEDs) are key elements in a competitive carrier-class voice solution. These devices address a variety of voice quality requirements, including echo cancellation, signal-to-noise ratio (SNR) enhancement and signal level control. The ability of VEDs to address these requirements is limited not only by imperfections of digital signal processing algorithms, but also by digital signal processor computational resources such as MCPS and memory. This article addresses these market and technical challenges for typical VEDs, including network echo cancellation (NEC), acoustic echo control (AEC), noise reduction (NR) and automatic level control (ALC).
Challenges: Old and new
Trends in today's telecommunications market result in an interesting climate for service providers who must deliver high-quality voice and data services across different telecommunication networks. As PSTNs converge with packet, cellular and enterprise networks, voice quality issues become more complex. Voice Enhancement Devices (VEDs) are those functional components that use advanced DSP algorithms to minimize performance degradation caused by unwanted factors in the networks, such as echo, signal/packet delay, noise, etc.; therefore they contribute to voice quality. Typical VEDs include the functional blocks of NEC, or line echo cancellation (LEC), AEC, NR, and ALC [1].
Let us consider a NEC as an example of VED operating under challenging conditions. Figure 1 shows a high-level network diagram illustrating the application of echo cancellation. The far-end speaker will hear the echo of his/her own voice if the echo caused at the near-end hybrid is of adequate energy and the round-trip delay of the PSTN network is longer than 20 to 30 ms. An echo canceller is installed in the Central Office that is close to the source of echo, i.e., the hybrid circuit. Two echo cancellers are needed for one bidirectional voice channel.
The first condition for echo being heard is the echo energy. The source of echo is the impedance mismatch of the hybrid circuit, which is a 4-wire to 2-wire converter including termination by 2-wire subscriber line and telephone set. That impedance mismatch is typically expressed in terms of attenuation of reflected energy by computing Echo Return Loss (ERL). The minimum requirement for the ERL of a hybrid is 6 dB (cf., [2]), which means that as much as one quarter of the energy of voice signals, or one half of the magnitude of voice signals, could be bounced back as echo. Although typical ERL values are somewhat greater (i.e., the reflected energy is smaller) and very often they are within the range of 12-20dB, the amount of reflected energy is sufficient to create the echo effect that interferes with telephone communication.
The second condition for echo being heard is the round-trip delay of transmitted signals.
For moderate ERL values, no echo is noticeable if the round-trip delay is less than 20"30 ms, because under such a condition an average telephone user can hardly distinguish the reflected signal from the side tone (for details refer to [17]). In the past, where the predominant telephone network was PSTN, echo could be heard typically during long distance telephone calls provided that the parties were separated by at least 2,000 miles, (i.e. when a telephone connection is across a continent or overseas) or the connection is via a satellite. In other words, echo could be heard because of the long transmission delay in the network.
Figure 1. A simplified system diagram for echo cancellation in a PSTN network.
However, for a packet network (See Figure 2) the total network delay has increased from a traditional long-distance PSTN transmission delay to a combination of various delays, including computation delay caused by different vocoders, processing delay with different frame sizes and jitter buffer delay. Today, a typical round-trip delay for the packet network might be as long as 100-200ms. The result of that long delay is that echo can be heard even when you call your next-door neighbor. In Figure 2, the round-trip network delay is dramatically increased compared with the traditional PSTN network because of additional delays such as computation/processing delays required by the voice codecs and jitter buffers. Legend: NEC: Network Echo Canceller; PAC: Packetizer; DP: Depacketizer; JB: Jitter Buffer; voice codecs are not shown.
Figure 2. A simplified system diagram for echo cancellation in a packet network.
Intense competition creates complex challenges for designers to deal with voice quality issues. Customers want complete solutions with attractive features, fast time-to-market delivery and good voice quality. Moreover, they want high channel density with low per-channel cost. These sometimes conflicting requirements limit computational resources available for implementing VEDs. Complex and computationally expensive algorithms are often first casualties of competitive market requirements.
Meanwhile, phone conversation environments are becoming less favorable to Quality of Service. Wireless calls conducted on the move can allow speech signals to be easily polluted by unwanted background noise from various sources, such as cars, restaurants or airports. Another contributor to poor QoS is the dynamically changing background noise level. Even for wireline calls, the speech signals can also be polluted by background noise, as people use cordless phones, switch between handset mode and hand-free mode or use hands-free mode for conference calls. All these factors impose great challenges on equipment vendors and service providers who are trying to maintain "toll quality" voice services in an increasingly demanding marketplace.