The new audiovisual Speech Enhancement feature in YouTube Stories (on iOS) allows users to take better selfie videos by automatically enhancing their voices and reducing background noise.
The Speech Enhancement feature in YouTube Stories uses Looking To Listen, a machine learning technology by Google.
In an effort to address the often overlooked quality of audio in videos, a few years ago Google introduced Looking to Listen, a machine learning (ML) technology that uses both visual and audio cues to isolate the speech of a video’s subject.
By training the model on a large-scale collection of online videos, it is now able to capture correlations between speech and visual signals such as mouth movements and facial expressions, which can then be used to separate the speech of one person in a video from another or to separate speech from background sounds.
Google claims that this technology achieves results in speech separation and enhancement (a 1.5dB improvement over audio-only models), it can also improve the results over audio-only processing when there are multiple people speaking, as the visual cues in the video help determine who is saying what.
YouTube creators who are eligible for YouTube Stories creation may record a video on iOS, and select “Enhance speech” from the volume controls editing tool.
This will apply speech enhancement to the audio track and will playback the enhanced speech in a loop. It is then possible to toggle the feature on and off multiple times to compare the enhanced speech with the original audio.