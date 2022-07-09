Today we have many tools to overcome the language barrier. Although the most effective is still to learn them, the translators, with the help of artificial intelligence, are becoming more agile when doing your job. In social networks they are very important, since they help everyone understand each other (although sometimes they fail), but Mark Zuckerberg wants to go further with them, and they will also be the key so that there are no communication problems in his idyllic metaverse .

Meta researchers have been working on artificial intelligence models focused on the translation of various languages ​​for a long time. In this case they have announced NLLB-200 (No No Language Left Behind), a model capable of translating 200 languages ​​in real timebeing twice as better than the system that Meta had up to now.

Eliminating language barriers with NLLB-200

As Zuckerberg commented in a Facebook post, many of the languages ​​that included in this model are not compatible with current translation systems. Said model, which is open-source and about which we can find out more details through his paper, has been trained using the Research SuperCluster supercomputer, being one of the fastest AI supercomputers in the world.

“To give an idea of ​​the scale of the program, the 200-language model looks at more than 50 billion parameters.”

According to Zuckerberg, the system It is prepared to carry out up to 25,000 daily translations through all the Meta apps. The tool is capable of translating both oral and written languages, and of those 200 languages, 55 African ones have been added, many of which are not available in current automatic translation systems.





In the image we can see the different translation models with their respective BLEU scores, a measure to evaluate the quality of these models through reference translations. On the chart, NLLB-200 outperforms BT, reaching a score of 37.84, the highest to date.

Eliminating language barriers is something that obviously benefits the flow of communication, and this is essential in social networks. Also, these types of systems will be key for real-time translation through virtual reality deviceswith the goal of facilitating an uninterrupted metaverse experience.

Translations from the original language and not from English

NLLB-200 is based on the M2M-100 model presented in 2020, a system that, instead of passing the translations from English, are made from the original language, assuming a more accurate translation. However, the bottleneck of this system is due to the over-representation of English on the Internet.

Most articles and content are usually in English, and the system requires millions of examples from its various supported languages ​​to perform its function. Here Meta explains it by comparing the number of Wikipedia articles in Swedish and Lingala. Although the former is spoken by about 10 million people, Lingala is spoken by 45 million people originating from the Democratic Republic of the Congo, the Republic of the Congo, the Central African Republic and in South Sudan. On Wikipedia there are 2.5 million articles in Swedish, while in the African language there are only 3,260.

To deal with this problem, Meta has improved its model to get more out of each sentence and processed word, also increasing the size of the databases used to feed the algorithm. To confirm the quality of the translations, they have used FLORES-200, an evaluation dataset that has served to train and improve their AI model.

