13 August 2015, Duke University – Human brains are able to focus in on one conversation in the hubub of a cocktail party by discarding the background noise. But what about robots?
Researchers from Duke University demonstrated a intelligent device that can distinguish speakers in crowd. It is called the ‘cocktail party problem’. Typical approaches to solving it have either involved systems with multiple microphones, which distinguish speakers based on their position in a room, or complex artificial-intelligence algorithms that try to separate different voices on a recording.
But the latest invention, described in this week’s Proceedings of the National Academy of Sciences, is a simple 3D-printed device that can pinpoint the origin of a sound without the need for any sophisticated electronics.
Combining acoustic metamaterials and compressive sensing, Researchers demonstrated a single-sensor based speaker listening system that functionally mimics the selective listening and sound separation capabilities of human auditory systems. Different from previous research efforts that generally rely on signal and speech processing techniques to solve the “cocktail party” listening problem, proposed method is a unique hardware-based approach by exploiting carefully designed acoustic metamaterials.
The device is a thick plastic disk, about as wide as a pizza. Openings around the edge channel sound through 36 passages towards a microphone in the middle. Each passage modifies the sound in a subtly different way as it travels towards the centre — roughly as if an equalizer with different settings were affecting the sound in each slice, explains senior author Steven Cummer, an electrical engineer at Duke University in Durham, North Carolina.
The way the disk works is simple, he says. If you speak across the top of a bottle that is partially filled with water, the air inside will resonate with the sound of the voice and attenuate certain frequencies, depending on the amount of water in the bottle. In the plastic disk, the innards of each sector are patterned with a honeycomb-shaped structure in which each hexagonal cell is cut to a different height. The result, Cummer says, is like having an array of bottles filled with different amounts of water.
The human ear is not able to distinguish how the sound is altered by different passages, says lead author Yangbo Xie, also at Duke. But the team wrote an algorithm that, by analysing each sound, can almost always tell which direction it came from.
Bruce Drinkwater, a mechanical engineer at the University of Bristol, UK, calls the idea “a really nice one”. He says that the device’s bulk could be a limitation to its practical use, and that this version works only at relatively high frequencies. However, he adds that “there could be plenty of room to optimize the design for size in the future”.
“This concept may also have applications outside the world of consumer electronics,” said Xie. “I think it could be combined with any medical imaging device that uses waves, such as ultrasound, to not only improve current sensing methods, but to create entirely new ones.
“With the extra information, it should also be possible to improve the sound fidelity and increase functionalities for applications like hearing aids and cochlear implants. One obvious challenge is to make the system physically small. It is challenging, but not impossible, and we are working toward that goal.”
This work was supported by a Multidisciplinary University Research Initiative under Grant N00014-13-1-0631 from the Office of Naval Research.
Abstract
Designing a “cocktail party listener” that functionally mimics the selective perception of a human auditory system has been pursued over the past decades. By exploiting acoustic metamaterials and compressive sensing, we present here a single-sensor listening device that separates simultaneous overlapping sounds from different sources. The device with a compact array of resonant metamaterials is demonstrated to distinguish three overlapping and independent sources with 96.67% correct audio recognition. Segregation of the audio signals is achieved using physical layer encoding without relying on source characteristics. This hardware approach to multichannel source separation can be applied to robust speech recognition and hearing aids and may be extended to other acoustic imaging and sensing applications.
- Explore further: Like Human Voice, Each Dog has Unique Barking
- Reference: “Single-sensor multispeaker listening with acoustic metamaterials”,Yangbo Xie, Tsung-Han Tsai, Adam Konneker, Bogdan-Ioan Popa, David J. Brady, and Steven A. Cummer. [DOI 10.1073/pnas.1502276112]
- Image-1: Cocktail party, dnjournal