Ambient intelligence is fast becoming a mainstream technology. Many homes now have some form of smart speakers that understand spoken commands. Car dashboards and navigation systems use voice control instead of touch interaction to avoid distracting the driver. IoT sensors and controllers, as well as wearables, are beginning to adopt voice-driven interfaces because it solves the problem of adding a user interface to a device that is too small to accommodate a touchscreen or keypad.
by Mark Patrick, Mouser Electronics
Clearly, a vital component of such designs is a microphone, preferably one that offers low cost with high quality. In many cases, a single microphone is not enough. The presence of strong background noise from TVs and or music systems when indoors or traffic and passers-by when outside makes it hard for a system armed with just a solitary microphone to determine which signals it should pay attention to. As a result, designers have started to incorporate two or more microphones in arrays. These systems use beamforming techniques to focus on important sound sources in the room, cancelling out interference from other sources. In doing so, they can track a user’s voice as the person moves around and perform blind-source separation when more than one user is in range.
Conventional microphone technology is too large to be integrated into most systems. Micro electro-mechanical system (MEMS) technology, however, makes it possible to scale the microphone down to a level where it is feasible to put multiple microphones into a mobile phone or IoT sensor hub. As a result, MEMS microphones can be found in as diverse places as cameras, security systems, digital voice assistants, robots, TVs and automotive cabin systems.
Traditional condenser type microphones are based on air-gap capacitors with a backplate and a flexible diaphragm. The capacitive design used by most MEMS microphones follows the same fundamental principle. Typically, the sensors employ two or three plates, a diaphragm and one or two backplates. When the diaphragm moves in response to a sound impulse, the capacitance changes and the signal conditioning electronics amplify the voltage passing through the circuit driving the microphone.
A dual-backplate design has the advantage of producing a differential output, which helps minimise the effects of interference. The symmetrical design also results in reduced distortion. A differential output is easier to process in the analogue domain too and thereby lowers power requirements. Additionally, MEMS technology is resistant to high temperatures and reflow soldering, supporting automated PCB assembly processes.
The performance requirements of microphones for digital systems can vary widely. In mobile phones and cameras, sonic fidelity is important, as the user will often want to be able to record sounds and play them back. Robots, TVs and security systems, among others, will often not require absolute fidelity but a signal that can be optimised for digital processing and recognition. Machine learning algorithms can often make use of features different to those employed by humans. A microphone that is optimised for playback may not highlight these signal features. What is important is that distortion is kept to a minimum and that the microphone is not susceptible to self-noise – this is the RMS noise voltage generated by the microphone itself when not excited by an external sound. It is often the result of interactions inside the package between different elements in the mechanical and electrical components.
Speech recognition systems can often be improved if the microphone used has a sufficiently high acoustic overload point (AOP). The AOP is defined as the sound pressure level (SPL) at which the total harmonic distortion exceeds 10%. It is measured in units of dBSPL. The reason for using a high AOP is that the speech signal will often be relatively quiet compared to background noise. In the case of an active speaker, that noise may come from the device itself and the user could be sitting at the other end of the room, further reducing the signal-to-noise ratio. A high AOP ensures algorithms are provided with enough headroom to cancel the noise signals and still pick up a relatively undistorted speech signal.
Figure 1: The IM69D130 from Infineon
To achieve a high AOP and low self-noise, Infineon opted for a dual-backplate design in its IM69D130 that is conceptually similar to that used in studio condenser microphones. Even at an SPL of 128dB, the distortion is no more than 1%. The microphone’s AOP is 130dBSPL. It has a flat frequency response with a roll-off as low as 28Hz and high output linearity. These attributes, combined with a tight manufacturing tolerance, allow for close phase matching between the microphones, which is important in applications using microphone arrays.
Figure 2: TDK InvenSense ICS‐52000
Employment of digital pulse-density modulation for the output means there is no need to include analogue components to process the output of the microphone. This reduces the need for RF protection on the PCB and helps limit the required board area in microphone array applications. To support extensive microphone arrays, the TDK InvenSense ICS‐52000 features a low noise digital time-domain multiplexed (TDM) output, as well as having a ±1dB sensitivity tolerance. The TDM interface lets an array of up to 16 microphones connect directly to a digital microprocessor without the need for a codec to process and sequence the data. These devices deliver samples at regular intervals in round-robin fashion. This behaviour is achieved by daisy-chaining the word-clock inputs and outputs of the individual microphones. In this arrangement, a clock master signal, which is provided by the system’s MCU or DSP, drives the word clock input of the first ICS-52000. Its word clock output line drives the word clock of the second ICS-52000 and so on. All that the receiving processor has to do is allocate each incoming sample to the appropriate buffer. The bottom-port design comes in a 4mm × 3mm × 1mm surface‐mount package and includes the MEMS sensor, signal conditioning, an analogue‐to‐digital converter, together with decimation and anti‐aliasing filters and power management.
Designed to support systems that need very low energy operation, PUI Audio’s PMM-3738-1010 helps offload the burden of detecting sounds from the host processor through its wake-on-sound feature. For much of the time, a sound-enabled IoT system will be listening to near silence. If the host processor needs to be active to handle incoming audio to determine if there is speech content to be processes, the battery life of the system will obviously suffer. One answer is to put simple front-end processing into the system that analyses the audio to see if it demands further examination. Frequency analysis can determine if the incoming audio indicates speech, for example. However, much of the time the system will hear very little.
The wake-on-sound feature of the PMM-3738-1010 means that almost the entire system can sleep until audio passes an SPL threshold. It takes advantage of piezoelectric technology to keep quiescent power consumption to a minimum. Used as a coating, aluminium nitride generates a voltage when the diaphragm is deformed by a sound wave. There is no enclosed air gap, so the element does not suffer from the acoustic damping that reduces the SNR of capacitive microphones. The use of piezoelectric elements makes it possible to have systems operate at very low power levels with little need to actively monitor the audio input. Instead, the voltage produced by a strong incoming sound wave can provide enough energy to capture a signal and then initiate the power-up of conversion circuitry.
Thanks to developments in both capacitive and piezoelectric technology there are now a wide variety of acoustic MEMS sensors available. The many choices offered mean that manufacturers can tune the sensor parameters and energy usage profile to their own needs and make the ability to listen to the world a key part of their designs.