Speech recognition is an important cog in the AI machine of big tech companies. The technology powers the digital assistants in our phones, cars and smart speakers in our homes. But despite its ubiquity, speech recognition is still a work in progress. Today, Facebook announced a major breakthrough in the way they train these systems to learn new languages. The company says it has developed a way to build speech recognition tools that do not require transcribed data.
Facebook says its novel AI system can free the technology from relying on text-to-speech input. The time-consuming task involves humans listening to and transcribing hours of audio, a monotonous process that must be repeated for each language. Facebook’s “unsupervised” system, on the other hand, learns purely from audio and unpaired text, giving it a better understanding of the sound of human communication.
Facebook’s model basically relies on feedback loops between generative adversarial networks (gAns) of “generators” and “discriminators.” The former spits out representatives of uploaded speech patterns that appear to be utter gibberish until they are put into the appropriate network of discriminators, which act as a sort of translator. At the same time, Facebook entered additional text written by humans to help the generator gather differences between computerized and real-world results. This process is repeated until the output of the generator matches the real text.
Facebook says its approach allows it to create speech recognition systems without any annotated data sets. The company has tested the model — called WAV2VEC-U (U stands for unsupervised) — in Swahili, Kyrgyz (spoken in the Central Asian republic of Kyrgyzstan) and Crimean Tatar, all of which lack high-quality speech recognition tools due to differences in training data.
Facebook’s tests showed that the system provided 63% fewer errors than the next-best unsupervised method. It added that the tool was as accurate as the monitoring system of a few years ago. To speed up its development, Facebook shared the code for Wav2VEC-u on GitHub.
The company said the breakthrough could bring speech recognition systems to more languages and dialects around the world, helping democratize technology. Naturally, it will benefit from this diffusion. More than 76% of Facebook’s 2.85 billion monthly users are outside North America and Europe. And automatic translation is crucial to its goal of connecting billions of people through their preferred language.
The original link: www.engadget.com/facebook-ai…