By Dave DeFusco
Imagine someone talking in a video conference while a piece of music is playing in the background. Besides being distracting, the music makes it hard for you to understand the speaker when you’re listening afterward to the recording.
Dr. Youshan Zhang, assistant professor of computer science and artificial intelligence, and Jialu Li of Cornell University have created a novel noise removal method that could benefit the hearing impaired and improve the listening experience for audiophiles everywhere.
In their paper, “BirdSoundsDenoising: Deep Visual Audio Denoising for Bird Sound,” the researchers described how they created a deep visual audio denoising (DVAD) model using a dataset of 15,300 bird sounds—varying in length from 1 second to 15 seconds—that strips out the background noise, in this case natural sounds like wind and rain, to produce clean bird sounds.
The researchers presented their model in January at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) conference in Hawaii. Dr. Zhang said the model is robust enough to apply to human speech, especially to background noise that is particularly damaging to speech intelligibility for people with difficulty hearing.
“Our DVAD model can first denoise the background noise and then increase the volume of the low voice,” he said.
In a novel twist, the researchers turned the audio of the bird sounds into a series of images; used a photo editing tool that eliminates the original background of an image without compromising its integrity; created a segmentation model to edit out the noisy parts of the image; and then applied an algorithm to produce the “denoised,” or clean bird sounds.
“To the best of our knowledge, we are the first to transfer audio denoising into an image segmentation problem,” said Dr. Zhang. “By removing the noise area in the audio image, we can realize the purpose of audio denoising.”
Background noise removal is the ability to enhance a noisy speech signal by isolating the dominant sound. It’s used in audio and video editing software, video conferencing platforms and noise-canceling headphones. It’s a fast-evolving technology, with artificial intelligence bringing a whole new domain of approaches to improve the task.
“Extensive experimental results demonstrate that our proposed model achieves state-of-the-art performance,” said Dr. Zhang. “We also show that our method can be easily generalized to speech denoising, audio separation, audio enhancement and noise estimation.”