Abu Dhabi unveils world’s biggest Arabic AI language processing model

The Technology Innovation Institute is in talks to commercialise and is making a smaller version available open-source

Humanoid behaviour from machines is largely thanks to advances in natural language processing. Photo: AFP
Powered by automated translation

Abu Dhabi's cutting-edge research hub has unveiled the world's biggest natural language processing model for the Arabic language.

Natural language processing, or NPL, is a key part of the booming artificial intelligence sector, helping computers to decode the spoken and written word to boost the development of everything from language translation tools to Siri and Alexa-style smart assistants.

The Noor model, developed at the Technology Innovation Institute, may give the Arab world a new edge in the push to digitalise as tools like chatbots, market intelligence and machine translation skew heavily to English and Chinese-speaking markets.

The priority is to find ways for Noor to be used by companies and academics to build new tools, like to provide sentiment analysis across social media, or to develop new Arabic virtual assistants, Dr Ebtesam Almazrouei, a director at TII who led the project, told The National.

But she said a smaller version of Noor would also be made available to the public, as an open source model.

"We want [Noor] to contribute to society," she said.

The size of Noor is significant. In NLP, the size of a given model is based on the number of values that model is trained on. These values are known as parameters, and they are the building blocks of machine learning. The greater the number of parameters, the more complex and capable an NLP model is.

Before, the largest available Arabic model was AraGPT, a model trained on 1.5 billion parameters. Noor was trained on 10 billion parameters, including a dataset that combines web data with books, poetry, news articles and technical information to significantly widen the applications that can be built with it.

According to TII, it is the largest high-quality cross-domain Arabic dataset ever made.

"At the 10 billion scale, our model can tackle more advanced tasks and take in more complex instructions from humans to machines," Dr Almazrouei said.

"For instance, it can summarise texts, assist with writing — for example, a press release. Also it can be used to power more natural and effective chatbots, or even evaluate the language level of employees. This is only the start, and we want to scale to even larger and more capable models in the future."

TII, the applied research arm of Abu Dhabi's Advanced Technology Research Council, is a critical part of the UAE's efforts to diversify from a reliance on oil exports and develop a knowledge-based economy. Noor is a first step in the research hub's efforts to contribute to the wider UAE Strategy for Artificial Intelligence by accelerating the adoption and integration of AI into the wider economy.

“Our expert teams have demonstrated yet again that this region can achieve breakthrough R&D outcomes to impact the world,” said Dr Ray Johnson, chief executive of TII.

Updated: April 15, 2022, 3:49 AM