Hands-free kits and in-vehicle hands-free systems have been around for many years. The technology for these systems isn't standing still, though; changes introduced by automotive OEMs, the network, as well as increasing customer expectations are driving many new innovations.
For example, wideband technology being introduced in Europe allows for greater call fidelity, but also, unfortunately, transfers more road noise. Sophisticated algorithms can remove noise from wind or defroster vents, allowing freedom in microphone placement without requiring complete system tuning for each vehicle.
An affliction prevalent in the automotive industry is the “Audio Is Easy” syndrome—also known as AIE syndrome. Symptoms include the illusion that all hands-free speech communications in an automobile requires is a microphone and loudspeaker connected to a network access device.
Delivering high quality hands-free speech communications in an automobile is unfortunately not so simple. It requires a thorough understanding of how to measure performance and of common causes of performance issues, and the application of the best available solutions to solve these performance problems.
The role and importance of performance measurements is often overlooked. Reliable and valid measurements are needed to specify requirements for suppliers, detect and diagnose problems early, optimize performance and predict end-users' satisfaction.
Understanding speech communications. An understanding of what we want to measure is necessary for recognizing valid performance measurements. Speech communications is fundamentally different than other types of communications systems, because the measure of its quality is, ultimately, subjective.
Data communications over a radio link, for example, can be measured by simply determining the number bits that were successfully transmitted. With speech communications, however, the transmitter and the receiver are people, and transmission performance must be assessed in terms of human perception.
Whether or not we even care about 1% or 3% bit errors in the telecommunications system depends on whether or not there is a perceivable difference to the people communicating. Figure 1 below shows the links involved with speech communications, or the “the speech chain” (Denes and Pinson, 1993).
|Figure 1. The speech chain.|
When people are geographically separated such that acoustic signals from the mouth of the speaker cannot directly reach the ear of the listener, a telecommunications system is needed to carry the speech signals between speakers and listeners.
This system can be thought of as just an extension of the speech chain (more links in the chain). Therefore, the validity of measures of telecommunication system performance depends on how closely they agree with human perception.
Test methods and requirements . It should be clear from the previous section that objective measurements, such as percent of speech frame loss, frequency response, etc., are indirect measures of what we really care about—performance as perceived by the user.
However, it is not always possible to use perceptual (or subjective) measures of performance, because these measures are expensive, time consuming, and cannot always be used to assess performance at intermediate points along the telephone connection. Therefore, objective measurements must play an important role in delivering high quality hands-free speech communications.
The state-of-the-art in terms of objective test methods and requirements can be found in ITU-T Recommendation P.1100 (Narrowband hands-free communications in motor vehicles). This is a new international standard that was recently approved and can be downloaded for free from the ITU-T website.
Figure 2 below shows the test arrangement used in ITU-T P.1100, with he optional digital injection of background noise shown. The term “speech” in the figure is used generically to refer to any type of test signal used to simulate speech signals.
Test equipment manufacturers already have commercially available turn-key measurement solutions for testing according to ITU-T P.1100. A wideband version of this standard is currently being developed by the Carcom ITU-T Focus Group and is expected sometime latter this year.
|Figure 2. Objective testing arrangement used in ITU-T P.1100.|
It is important to note that there are limitations with the objective test methods and requirements found in ITU-T P.1100 and other requirements documents, such as VDA 1.6. More specifically, objective test results do not always agree with subjective performance. This is particularly true with measures of echo, attenuation during double-talk, and one-way speech quality predictions (i.e., MOS). Therefore, important decisions should be based on subjective evaluations.
General guidance on subjective testing can be found in Section 13 of ITU-T P.1100. This information should prove useful to those unfamiliar with subjective testing. However, this guidance does not follow the form of a test plan and does not provide requirements. Work has begun within ITU-T Study Group 12 to develop detailed subjective test methods and requirements for automotive, but a new standard (“Subjective requirements for hands-free speech communications in motor vehicles”) is not expected until mid 2010.
Hands-free performance problems
The common causes of hands-free performance problems can be grouped into three categories: fundamental, design, and new technology problems.
Fundamental problems . There are two fundamental problems that affect hands-free performance. The first is increased distance between the user and transducers (i.e., microphone and loudspeaker). The second is increased noise.
Increasing the distance between the user and transducers causes more echo, noise, and speech distortion. Echo increases for three reasons. First, when the transducers are moved away from the head there is no longer an acoustic obstruction (i.e., the head itself) between the loudspeaker and microphone.
Second, the echo path gain increases because both the microphone and loudspeaker signals need to be increased by about 6 dB for every doubling of the distance. Third, the echo signal level can be even higher than the driver's speech at the output of the microphone, making it difficult for many Acoustic Echo Cancellers (AEC) to eliminate echo.
Noise is also increased because of distance. The additional microphone gain needed to compensate for distance will also increase the level of any acoustic or electrical noise.
Speech is more distorted for two reasons. First, the ratio of direct-to-reflected speech energy decreases with increased distance, which causes the speech to sound reverberant, or as if the driver is talking inside a tunnel. Second, the Signal-to-Noise Ratio (SNR) is lower, which causes the speech coder to introduce more coding artefacts.
Increased noise in a vehicle environment is the second fundamental problem that affects hands-free performance and comes from from acoustic noise sources, airflow, mechanical vibration, and Electro-Magnetic Interference (EMI).
The noise is heard by the person on the far end of the telephone connection. It also reduces the SNR and causes the speech coder to introduce more coding artefacts. Acoustic noise in the cabin makes it more difficult for the driver to comprehend speech and causes the driver to increase the receive volume (i.e., gain), which in turn causes echo problems as described above.
Design problems arise because of decisions related to system design and component selection. The following subsections discuss several design problems that affect hands-free performance.
Vehicle cabin acoustics. Vehicle platforms can have high levels of acoustic noise in the cabin due to poor mechanical damping, lack of sound insulation, and air gaps that allow the transmission of sound—just to name a few examples. Acoustic noise negatively affects both ends of the telephone connection as previously discussed.
Vehicle HVAC system. A vehicle's HVAC system can cause problems in a couple of ways. First, airflow from vents can pass over microphones and cause them to produce an undesirable noise referred to as “wind buffeting.”
Second, acoustic noise generated by the HVAC system can be picked-up by the microphone. These problems cause noise and speech distortion that can be heard by the far end.
Microphone subsystem. Common problems found with microphones are placement and orientation. Microphones that are placed far from the driver need more gain, which, as previously discussed, can cause noise, speech distortion, and echo problems. Placing microphones too close to noise sources can also cause noise and speech distortion problems.
Orientation is another problem. Sometimes the “cone of sensitivity” for directional microphones and beamformers is not pointed towards the mouth of the driver. Worse yet, the microphone could be pointed towards a noise source, a reflecting surface such as a side window, or even a loudspeaker. All of these problems have been found in production vehicles. These orientation problems can cause noise, speech distortion and echo.
Loudspeaker subsystem . Problems with the loudspeaker subsystem include loudspeaker placement, speech level changes, and loudspeaker distortion. Loudspeaker positions that place the driver in the echo path will degrade AEC performance.
Speech level changes are caused by Speed Compensated Volume (SCV). SCV attempts to maintain consistent loudness in the presence of noise, but often fails to work properly because wheel speed does not accurately predict cabin noise. SCV also degrades AEC performance because it constantly changes the echo path.
Loudspeaker distortion sounds bad to the driver. It also causes nonlinearities in the echo path, degrading AEC performance and resulting in the far end hearing echo.
Signal transport . The main problem with signals traveling within the vehicle is EMI. EMI affects signals in both the microphone and loudspeaker subsystems. These problems are heard as noises such as “alternator whine” and “hiss”.
HF unit . Most of the problems with the HF unit have to do with the choice of speech enhancement software (AEC, NR, etc.). Not all software is created equal.
The automotive environment is very challenging, and speech enhancement software developed for non-automotive applications often fails. Some speech enhancement solutions suffer from basic limitations in their ability to perform. Others lack the flexibility to perform well on different vehicle platforms without extensive tuning of parameters. Some solutions lack the necessary features—they only provide AEC and NR.
Other problems may come from application code and hardware. Application code can cause excessive delay and dropped speech samples. Typical hardware problems are electrical noise and anti-aliasing filter problems.
Problems with the HF unit can be heard by both ends as echo, speech distortion, noise, and delay.
Wireless subsystem. Problems with the wireless subsystem (e.g., radio antenna design) can cause speech frame loss which is heard as speech distortions by users. Another problem that can occur is large variability in speech levels.
Problems introduced by new technologies
The following subsections review challenges presented by two technologies that are relatively new to the automotive industry: wider bandwidth speech, and microphone arrays.
Wider bandwidth speech. Currently, speech signals are band-limited by the telephone network from their normal frequency range of around 50-10,000Hz down to 300-3400Hz. However, Wider Bandwidth Speech (WBS) is starting to be used over networks. “Wideband” (50-7,000Hz) speech is expected to be deployed in mobile networks in Germany this year. “Super-wideband” (50-14,000Hz) and “Full-band” (20-20,000Hz) speech have already been deployed on enterprise and VoIP networks.
WBS has a lot of good to offer. It improves task performance, offering better speech comprehension, improved talker identification, etc. and it is preferred by users due to better quality speech and less listening-effort, etc.). However, with WBS users are more sensitive to noise and echo, and many AECs have a harder time removing high frequency echo.
Microphone arrays . Microphone arrays use multiple microphones to improve SNR and reduce acoustic coupling. Two types of arrays are “beamformers” and “mixers”. Beamformers create beams of sensitivity that can be pointed at the driver.
Less sensitive areas, or nulls, can also be oriented towards noise sources and loudspeakers (to reduce echo). Mixers simply add the signals from the multiple microphones. The speech signals from these microphones will add together to increase the speech level, but the noise will tend to cancel itself.
There are a few problems with beamformers. First, they come with strict packaging constraints. Microphone element spacing and orientation is critical. These constraints can often be difficult for OEMs to work with since the needed real estate may be required for other purposes. Second, they are very sensitive to wind buffeting. Third, they have a frequency dependent loss curve that requires equalization of the speech signal. Fourth, they need to be tuned for different configurations in vehicles.
Hands-free performance problem solutions
Solutions to the problems identified in the previous sections can be grouped into two areas: good vehicle platform design, and high performance speech enhancement software.
The following solutions target the vehicle platform :
Acoustical design. Reduce acoustic noise levels through better acoustical design of the vehicle cabin (e.g., sound damping, sound insulation, etc.). This will improve quality for both the driver and far end.
Microphone type. . Use a directional microphone or microphone array. They maximize speech pick-up and reject acoustic noise, cabin reflections (i.e., reverberation), and echo coming from loudspeakers.
Microphone placement/orientation/mounting. Package microphones so that distance to the user is minimized. Also try to maximize distance from noise sources, keep them out of airflow, ensure they are oriented properly, and mechanically isolate them from vehicle vibration.
Loudspeaker selection/mounting/placement. Use good quality loudspeakers and make sure mounting will minimize distortion. The AEC will not perform well if there is distortion in the echo path. Also, try to maximize distance from the closest microphone.
HVAC design. Design cabin airflow to avoid prime microphone locations. Also design HVAC system to be quieter (e.g., sound damping, component selection, etc.).
Signal transport. Use optical cables or differential audio to transport speech signals. This will eliminate or reduce the effects of EMI.
RF system design. Ensure good RF antenna design and placement to minimize speech frame loss.
The solutions to be implemented in software include:
AEC. Use high performance AEC designed for an automotive environment. It should be capable of meeting “Type 1” performance requirements of ITU-T P.1100.
Microphone array signal processing. Use microphone array signal processing that combines the best aspects of beamformers and mixers, and is not susceptible to the problems with traditional beamformers. This type of processing has been referred to as a “complex mixer”.
Noise Reduction (NR). Use high performance NR designed for an automotive environment. It should handle transient noises and be able to track noise even during speech.
Wind buffeting removal. Use software that can detect and remove wind buffeting noise.
Dynamic equalization. Use dynamic equalization which adjusts the frequency response based on the cabin noise spectrum. This will improve quality in both quiet and noisy environments.
AGC. Use AGC to prevent the need to tune transmit gain due to microphone position, deliver consistent levels in send, compensate for different incoming levels from telephone network in receive, and reduce echo in noisy environments by automatically lowering microphone gain when drivers talk louder.
Dynamic Level Control (DLC). Use DLC to automatically adjust volume in noisy environments. DLC is more accurate than SVC because it directly measures cabin noise. Gain adjustments are also outside of the echo path so there is no negative impact on AEC performance.
Limiter. Use a limiter to reduce distorted speech and improve AEC performance.
BandWidth Extension (BWE) and High Frequency Encoding (HFE). Use BWE and HFE to improve quality and intelligibility. BWE converts narrowband speech to wider bandwidth speech by reconstructing low and high frequency speech energy. HFE transmits more high frequency speech energy by compressing it into the telephone passband.
Wideband capable. Make sure speech enhancement functions have been designed to support high demands of WBS. It is not clear exactly when WBS will be widely deployed in automotive, but it could happen very soon.
Scott Pennock Sr. is the Hands-Free Standards Specialist at QNX Software Systems.
 Denes, P. B., Pinson, E. N. (1993). “The Speech Chain”. The Speech Chain (2nd ed.) (pp. 1-9). New York, NY:W. H. Freeman and Company.