Meta has launched a speech-to-speech translation system powered by artificial intelligence (AI). It works for languages that are primarily spoken rather than written.
Meta has launched AI-based speech translation, which focusses on the translation of 3,500 spoken languages that do not have a written form. The research and development of this initiative and the AI tool intend to bridge the gaps between spoken and written language systems and streamline it in the physical world and metaverse. To begin with, Meta has focused on Hokkien, which is a primarily oral language spoken within the Chinese diaspora.
Machine translation tools built using standard techniques would not be able to support this development, as it requires large amounts of the written text in order to train an AI model.
The first AI-powered speech-to-speech translation system for Hokkien translation models will be open-sourced, and evaluation datasets and research papers would be widely available so that others can reproduce and build on the work.
The translation system is part of the Universal Speech Translator project, which is developing new AI methods that intend to eventually allow real-time speech-to-speech translation across many languages. Meta reckons spoken communication can bring people together wherever they are located, also in the metaverse.
Many speech translation systems rely on transcriptions. However, since primarily oral languages don’t have standard written forms, producing transcribed text as the translation output doesn’t work. So, speech-to-speech translation was focused on.
To do that, Meta developed a variety of methods, such as using speech-to-unit translation to translate input speech to a sequence of acoustic sounds, and generated waveforms from them or rely on text from a related language, in this case, Mandarin.
The Hokkien translation model is still a work in progress and can translate only one full sentence at a time, it’s a step toward a future where simultaneous translation between languages is possible. The techniques pioneered can be extended to many other written and unwritten languages.
Meta is also releasing SpeechMatrix, which is a large collection of speech-to-speech translations developed through an innovative natural language processing toolkit called LASER. These tools will enable other researchers to create their own speech-to-speech translation systems and build on the work. And the progress in what researchers refer to as unsupervised learning demonstrates the feasibility of building high-quality speech-to-speech translation models without any human annotations. This will help extend those models to work for languages where there isn’t any labeled training data available to train the system.