Improving real-time voice quality in a VoIP-based telephony design - Embedded.com

Improving real-time voice quality in a VoIP-based telephony design

This “Product How-To” article focuses how to use a certain product in an embedded system and is written by a company representative.

The general purpose SoCs used by today's cordless or IP phones,integrated access devices and wireless unified communications devices,fully support the software DSP (soft-DSP) required for VoIP byintegrating a software voice engine within the system software.

Voice engines fit within an embedded processor's system performancecapabilities using soft-DSP implementation techniques, and to guaranteetelephony- quality voice performance for VoIP, the system software mustmeet the real-time requirements of the voice engine.

Next-generation soft-DSP products that incorporate both real-timeprocessing and wideband (high definition) voice communication achievegreater end user satisfaction and market potential than currenttechnology. These products set a new high definition standard for voicecommunication.

Figure1. The use of a DMA peripheral to collect audio samples into a bufferfor servicing by the voice engine is a more efficient approach thanCPLD implementation.

This article discusses how to integrate a voice engine for soft-DSPprocessing in order to exceed telephony quality communication.

Conversely, failure to meet the real-time requirements may causemany symptoms of poor voice quality, including voice dropout,noticeable delay, pops or clicks, fax/modem call failure or corrupt faxpages, incomprehensible speech due to packet loss or excessive delayetc.

Failure to meet real-time requirements results in a missed deadline;this may be a critical system failure requiring a full system reset,unless the system supports recovery in hardware and software.

Minimize delay
Voice communication in telephony calls is bi-directional: Transmissionand reception of audio occur simultaneously. Thus, it is critical tominimize delay within the voice system to ensure audio quality.However, delay-minimizing optimizations conflict with meeting thedemands of voice processing.

In traditional playback audio systems, such as audio (MP3) playbackor multimedia streaming, buffering can increase significantly tocompensate for lack of system processing capability – delay isindependent of quality.

The voice engine does not have this option, as an audio buffer mustbe fully processed within a fixed time. This is architected throughinterrupt prioritization and software scheduling, leveraging and, insome cases, enhancing the operating system's real-time capabilities toguarantee voice processing completion.

In a voice engine system, a software interrupt service routineexchanges voice samples with a voice hardware codec. The voice hardwarecodec converts analog signals to and from audio samples with a samplingrate of 8kHz.

For telephony applications, the hardware codec is connected to asubscriber line interface circuit (SLIC) as the telephony physicalinterface, or to a DECT radio, for cordless handsets.

For IP phones or mobile handsets, the hardware codec is connected toan amplifier, which connects to a microphone and loudspeaker.

Figure2. Listed are the voice engine timing requirements.

The SoC hardware interfaces play a large role in guaranteeing bothreal-time performance and accurate scheduling of the voice engine. Ifthe SoC has a TDM or AC97 peripheral, a telephony voice codec isdirectly interfaced to the processor.

If the embedded processor is missing this peripheral, thelowest-cost solution is to interface a CPLD to the processor. The CPLDsends and receives samples to the hardware codec on a sampleby- samplebasis, representing the most time-sensitive system solution and theworst-case timing requirements.

Servicing the interrupt
Whether through TDM, AC97 or CPLD, the servicing of the voice hardwaremust be prioritized to ensure that the interrupt is serviced; othersystem software must not block this interrupt's critical timing. At8kHz sampling rate, the interrupt will occur every 125µs.

For an SoC running at 200MHz, the duration of the speed-optimizedCPLD interrupt service routine requires processing time of 25µs.This allows the maximum interrupt latency to be calculated as90µs (125µs – (25µs + 10µs for interruptservicing setup time)).

For the system to meet real-time deadlines, the OS must invoke theinterrupt service routine upon receiving the codec interrupt within90µs and the OS must allow the servicing to run to immediatecompletion.

The OS must also guarantee that the interrupt service routine canschedule the voice engine to perform immediate operation on the audiobuffers; the interrupt service routine uses a buffer ready signal toactivate this scheduling, as shown in the figure. A DMA peripheral isused to collect audio samples into a buffer for servicing by the voiceengine, a more efficient approach than the CPLD implementation.

The requirement for the voice engine is to complete before the nextvoice buffer is ready. The time required to process voice in the voiceengine depends on several factors: the processor, cache size, RAMspeed, number of physical voice interfaces (audio channels), thesoft-DSP processing required for the buffer and the type of speechcoders employed.

Timing needs
For complete analysis of the voice engine timing requirements, refer tothe table. The tidle measurement indicates the remaining time in whichall other system processes or system applications have for availableprocessing; from the voice engine design perspective, this is referredto as idle time.

All lower priority system processing occurs in the idle time afterthe voice engine completes real-time voice processing. In worst-caseprocessing, the tidle may reach 0ms for several iterations of voiceengine processing.

D2 Technologies' vPort software includes performance benchmarks forsupported con- figurations. For example, a vPort release may specifythe voice processing of a three-way G.729AB voice conference call,requiring a maximum of 100MHz of processing every 10ms in the voiceengine, as worst-case and with cache continually flushed.

If running on a 400MHz RISC processor, tvoice will require 100MHz inworst-case processing (25 percent of CPU processing), which correspondsto 2.5ms of processing time in every 10ms processing interval.

The real-time deadline will be missed if tswitch is greater than7.5ms (tswitch = tbuffer – (tvoice + tidle)); and, this does notinclude the additional overhead introduced during voice engineprocessing due to other peripheral interrupts, bottom halves ortasklets.

These are the most important design criteria for the system designerto consider when integrating a voice engine for soft-DSP processing:

For maximum quality, voice communication requires minimizingsystem delays.

Voice communication is continuous; missing samples or real-time isa critical error.

The voice hardware has strict timing requirements and needs amethod for error recovery in the case of missed timing.

The voice engine real-time processing must complete processing ona voice buffer within a 10ms software deadline.

The voice engine interrupt service routine has strict timingrestrictions based on the CPU peripheral hardware.

Jonathan Cline is Senior LeadEngineer at D2 Technologies Inc.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.