The latest addition to the automatic captioning system should help paint a more accurate picture for viewers

Mar 23, 2017 23:29 GMT  ·  By

YouTube's automated captions have been getting better over the past few years and they can now even detect and add sound effect information. 

"The effect of audio on our perception of the world can hardly be overstated. Its importance as a communication medium via speech is obviously the most familiar, but there is also significant information conveyed by ambient sounds. These ambient sounds create context that we instinctively respond to, like getting startled by sudden commotion, the use of music as a narrative element, or how laughter is used as an audience cue in sitcoms," YouTube's Sourish Chaudhuri writes in a blog post.

The automatic captions were introduced back in 2009 and they've focused heavily on transcribing speech in order to make the content hosted more accessible. But without similar descriptions for ambient sounds in videos, the impact of some videos is greatly diminished.

Neural network put to work

The feat is possible thanks to a collaboration between YouTube, Sound Understanding, and Accessibility, who embarked on the task to develop the first ever automatic sound effect captioning system for YouTube. They used thousands of hours of videos to train a deep neural network model to achieve high-quality recognition results.

"As a result, we can now automatically detect the existence of these sound effects in a video and transcribe it to appropriate classes or sound labels. With so many sounds to choose from, we started with [APPLAUSE], [MUSIC] and [LAUGHTER], since these were among the most frequent manually captioned sounds, and they can add meaningful context for viewers who are deaf and hard of hearing," a blog post reads.

There's still a lot more to work on, but even this is a great addition. Earlier this year, YouTube announced that it had reached its 1 billionth video with automatic captions. Given how many people watch videos on YouTube, this type of captions could come in handy for many of them.

On top of this, YouTube says that during some user studies, two-thirds of participants said the new sound effect captions enhance the overall experience.

Hopefully, as the system learns more and more, we'll see even more complex additions to this captioning system.