Revolutionary AI Headphones Can Isolate a Single Voice in a Crowd.

 

Revolutionary AI Headphones Can Isolate a Single Voice in a Crowd.

 Noise-cancelling headphones have become quite effective at generating an auditory blank slate. However, permitting certain noises from a wearer's environment through erasure remains a hurdle for researchers. The most recent version of Apple's AirPods Pro, for example, automatically adjusts sound levels for wearers — sensing when they're conversing, for example — but the user has no control over who to listen to or when this occurs.

 

 

Ai headphones

A team at the University of Washington created an artificial intelligence system that allows a user to "enroll" someone by looking at them for three to five seconds while wearing headphones. The device, named "Target Speech Hearing," then eliminates all other noises in the environment and plays only the enrolled speaker's voice in real time, even when the listener walks around in noisy places and no longer faces the speaker.

 The team's findings were presented on May 14 in Honolulu at the ACM CHI Conference on Human Factors in Computing Systems. The code for the proof-of-concept gadget is accessible for others to develop upon. The system is not commercially available.




 We now think of AI as web-based chatbots that answer queries," said senior author Shyam Gollakota, a professor at the University of Washington's Paul G. Allen School of Computer Science and Engineering. "However, in our study, we use AI to adjust the audio impression of everyone using headphones based on their preferences. Our technologies allow you to clearly hear a single speaker even in a crowded area with several other individuals conversing."

 A user points their head toward a speaker while wearing store-bought headphones equipped with microphones and presses a button to activate the system. The speaker's speech should then simultaneously be picked up by the microphones on either side of the headset, with a 16-degree error margin. The team's machine learning program uses the signal from the headphones to identify the voice patterns of the target speaker on an embedded computer that is mounted on the vehicle. Even as the two move about, the system picks up that speaker's voice and keeps playing it back to the audience. As the speaker continues speaking, the system's capacity to concentrate on the enrolled voice increases, providing the system with further training data.

 The technology was evaluated by the researchers on twenty-one individuals, who on average gave the enrolled speaker's voice clarity almost twice as high a rating as the unfiltered audio.

The team's earlier "semantic hearing" study, which let users choose which sound classes, such voices or birds, they wished to hear and muffled background noise, is built upon in this work.
The TSH system can only enroll one speaker at a time at this time, and it can only do so when the target speaker's sound is not being joined by another loud voice coming from the same direction. A user can run a second enrollment on the speaker to increase clarity if they're not satisfied with the sound quality.
In the future, the team hopes to expand the system to include earphones and hearing aids.

Takuya Yoshioka, director of research at AssemblyAI, and doctorate students Bandhav Veluri, Malek Itani, and Tuochao Chen from the University of Washington's Allen School also contributed to the paper as co-authors. The Thomas J. Cabel Endow Professorship, the Moore Inventor Fellow grant, and the UW CoMotion Innovation Gap Fund provided funding for this study.
Get in touch with tsh@cs.washington.edu for additional details.