Called SeamlessM4T, the translation platform is the “first all-in-one multilingual multimodal AI translation and transcription model”, Meta said.
Multimodal engines are those platforms that understand language from speech as well as text, and they can generate translations into either or both.
SeamlessM4T can perform speech-to-text, speech-to-speech, text-to-speech, and text-to-text translations for up to 100 languages depending on the task.
“Compared to approaches using separate models, SeamlessM4T’s single system approach reduces errors and delays, increasing the efficiency and quality of the translation process,” Meta said.
“This enables people who speak different languages to communicate with each other more effectively,” it added.
AI-driven translation industry is booming.
The global machine translation market size is expected to reach almost $4.1 billion in 2030 from $812.6 million in 2021, according to India-based Acumen Research and Consulting.
Machine translation is the process of translating text or speech from one language into another using software.
Meta said it is publicly releasing SeamlessM4T under a research licence to allow researchers and developers to build on this work. It has also released the metadata of SeamlessAlign, the biggest open multimodal translation data set to date, totalling 270,000 hours of mined speech and text alignments.
The new translation engine comes with speech recognition ability for nearly 100 languages. It can perform speech-to-text translation for almost 100 input and output languages. Speech-to-speech translation can be done in nearly 100 input languages and 36 (including English) output languages.
Additionally, it can do text-to-text translation for nearly 100 languages and text-to-speech translation for around 100 input languages and 35 (including English) output languages.
Meta said SeamlessM4T is part of its efforts to create a universal translator.
Last year, Meta released No Language Left Behind (NLLB), a text-to-text machine translation model that supports 200 languages. It has been integrated into Wikipedia as one of the translation providers.
In October, it released its first speech-to-speech translation system for spoken languages. Developed under Meta’s Universal Speech Translator project, the system focuses on developing AI systems that provide speech-to-speech translation across all languages.
Earlier this year, the company revealed Massively Multilingual Speech, which provides speech recognition, language identification and speech synthesis technology across more than 1,100 languages.
“SeamlessM4T draws on findings from all of these projects to enable a multilingual and multimodal translation experience stemming from a single model, built across a wide range of spoken data sources with state-of-the-art results,” Meta said.
SeamlessM4T also comes with code-switching ability. It happens when a multilingual speaker uses more than one language while speaking. It lets the engine to automatically recognise and translate more than one language when mixed in the same sentence.