PRODUCT HOW-TO: Improving real-time voice quality in a VoIP-based telephony design
The general purpose SoCs used by today's cordless or IP phones,
integrated access devices and wireless unified
communications devices,
fully support the software DSP (soft-DSP) required for VoIP by
integrating a software voice engine within the system software.
Voice engines fit within an embedded processor's system performance
capabilities using soft-DSP implementation techniques, and to guarantee
telephony- quality voice performance for VoIP, the system software must
meet the real-time requirements of the voice engine.
Next-generation soft-DSP products that incorporate both real-time
processing and wideband (high definition) voice communication achieve
greater end user satisfaction and market potential than current
technology. These products set a new high definition standard for voice
communication.
 |
| Figure
1. The use of a DMA peripheral to collect audio samples into a buffer
for servicing by the voice engine is a more efficient approach than
CPLD implementation. |
This article discusses how to integrate a voice engine for soft-DSP
processing in order to exceed telephony quality communication.
Conversely, failure to meet the real-time requirements may cause
many symptoms of poor voice quality, including voice dropout,
noticeable delay, pops or clicks, fax/modem call failure or corrupt fax
pages, incomprehensible speech due to packet loss or excessive delay
etc.
Failure to meet real-time requirements results in a missed deadline;
this may be a critical system failure requiring a full system reset,
unless the system supports recovery in hardware and software.
Minimize delay
Voice communication in telephony calls is bi-directional: Transmission
and reception of audio occur simultaneously. Thus, it is critical to
minimize delay within the voice system to ensure audio quality.
However, delay-minimizing optimizations conflict with meeting the
demands of voice processing.
In traditional playback audio systems, such as audio (MP3) playback
or multimedia streaming, buffering can increase significantly to
compensate for lack of system processing capability - delay is
independent of quality.
The voice engine does not have this option, as an audio buffer must
be fully processed within a fixed time. This is architected through
interrupt prioritization and software scheduling, leveraging and, in
some cases, enhancing the operating system's real-time capabilities to
guarantee voice processing completion.
In a voice engine system, a software interrupt service routine
exchanges voice samples with a voice hardware codec. The voice hardware
codec converts analog signals to and from audio samples with a sampling
rate of 8kHz.
For telephony applications, the hardware codec is connected to a
subscriber line interface circuit (SLIC) as the telephony physical
interface, or to a DECT radio, for cordless handsets.
For IP phones or mobile handsets, the hardware codec is connected to
an amplifier, which connects to a microphone and loudspeaker.
 |
| Figure
2. Listed are the voice engine timing requirements. |
The SoC hardware interfaces play a large role in guaranteeing both
real-time performance and accurate scheduling of the voice engine. If
the SoC has a TDM or AC97 peripheral, a telephony voice codec is
directly interfaced to the processor.
If the embedded processor is missing this peripheral, the
lowest-cost solution is to interface a CPLD to the processor. The CPLD
sends and receives samples to the hardware codec on a sampleby- sample
basis, representing the most time-sensitive system solution and the
worst-case timing requirements.
Servicing the interrupt
Whether through TDM, AC97 or CPLD, the servicing of the voice hardware
must be prioritized to ensure that the interrupt is serviced; other
system software must not block this interrupt's critical timing. At
8kHz sampling rate, the interrupt will occur every 125µs.
For an SoC running at 200MHz, the duration of the speed-optimized
CPLD interrupt service routine requires processing time of 25µs.
This allows the maximum interrupt latency to be calculated as
90µs (125µs - (25µs + 10µs for interrupt
servicing setup time)).
For the system to meet real-time deadlines, the OS must invoke the
interrupt service routine upon receiving the codec interrupt within
90µs and the OS must allow the servicing to run to immediate
completion.
The OS must also guarantee that the interrupt service routine can
schedule the voice engine to perform immediate operation on the audio
buffers; the interrupt service routine uses a buffer ready signal to
activate this scheduling, as shown in the figure. A DMA peripheral is
used to collect audio samples into a buffer for servicing by the voice
engine, a more efficient approach than the CPLD implementation.
The requirement for the voice engine is to complete before the next
voice buffer is ready. The time required to process voice in the voice
engine depends on several factors: the processor, cache size, RAM
speed, number of physical voice interfaces (audio channels), the
soft-DSP processing required for the buffer and the type of speech
coders employed.
Timing needs
For complete analysis of the voice engine timing requirements, refer to
the table. The tidle measurement indicates the remaining time in which
all other system processes or system applications have for available
processing; from the voice engine design perspective, this is referred
to as idle time.
All lower priority system processing occurs in the idle time after
the voice engine completes real-time voice processing. In worst-case
processing, the tidle may reach 0ms for several iterations of voice
engine processing.
D2 Technologies' vPort software includes performance benchmarks for
supported con- figurations. For example, a vPort release may specify
the voice processing of a three-way G.729AB voice conference call,
requiring a maximum of 100MHz of processing every 10ms in the voice
engine, as worst-case and with cache continually flushed.
If running on a 400MHz RISC processor, tvoice will require 100MHz in
worst-case processing (25 percent of CPU processing), which corresponds
to 2.5ms of processing time in every 10ms processing interval.
The real-time deadline will be missed if tswitch is greater than
7.5ms (tswitch = tbuffer - (tvoice + tidle)); and, this does not
include the additional overhead introduced during voice engine
processing due to other peripheral interrupts, bottom halves or
tasklets.
These are the most important design criteria for the system designer
to consider when integrating a voice engine for soft-DSP processing:
For maximum quality, voice communication requires minimizing
system delays.
Voice communication is continuous; missing samples or real-time is
a critical error.
The voice hardware has strict timing requirements and needs a
method for error recovery in the case of missed timing.
The voice engine real-time processing must complete processing on
a voice buffer within a 10ms software deadline.
The voice engine interrupt service routine has strict timing
restrictions based on the CPU peripheral hardware.
Jonathan Cline is Senior Lead
Engineer at D2 Technologies Inc.