Facebook said it developed the world's first multilingual machine translation tool that can translate between any pair of 100 languages without first translating into English like existing systems do.
Currently, when translating from German to Arabic, most of the available English-centric models first translate German to English and then to Arabic, since English training data is the most widely available.
However, Facebook's new algorithm model M2M-100 directly translates German to Arabic, providing faster results while preserving the actual context and the meaning of the text.
“For years, AI researchers have been working towards building a single universal model that can understand all languages across different tasks ... a single model that keeps translations up to date and creates new experiences for billions of people equally,” Angela Fan, research assistant at Facebook AI Research, said.
M2M-100 “brings us closer to this goal”, Ms Fan said. “Breaking language barriers through machine translation is one of the most important ways to bring people together, provide authoritative information on Covid-19 and keep them safe from harmful content.”
The translation model is open sourced and the original source codes are made freely available. It will facilitate other independent researchers and technology companies to reproduce, modify and further advance the existing multilingual models according to their requirements.
The machine learning-based model is trained on nearly 2,200 language pairs, almost ten times more than the previous best models that rely only on English language data.
“We will continue to improve our model by incorporating cutting-edge research, exploring ways to deploy machine translation systems responsibly and creating more specialised architectures,” Ms Fan said.
“The research can further advance how our systems understand text for low-resource languages using unlabelled data,” she added.
Traditional machine translation tools require building separate AI models for each language and each task.
Some of the advanced multilingual systems can process multiple languages at once, but their accuracy is compromised since they rely on English data to bridge the gap between the source and target languages, Facebook said.
“This approach does not scale effectively on Facebook, where people post content in more than 160 languages across billions of posts. We need one model that can translate any language to better serve our community … nearly two-third of which use a language other than English,” said Ms Fan.
Facebook said it is using various scaling techniques to create translation data sets and build a universal model with 15 billion parameters to reflect a more diverse script of languages. It has already created data sets with 7.5 billion sentences for 100 languages.
“The volume of data required for training grows quadratically with the number of languages that we support,” Ms Fan said.