ABU DHABI, 1st September, 2023 (WAM) — A local newspaper has said despite the Arabic language's global significance, its online presence is surprisingly minimal, although it is one of the most spoken languages offline. The rise of advanced AI technology, particularly large language models, highlights the need to address linguistic diversity in the digital age.
"With the advent of new AI technology that can generate text, speech, images and other media, often to a level that matches or possibly surpasses human ability, the online presence of languages has suddenly become more relevant than ever," The National said in an editorial on Friday.
"Generative AI" is expected to transform the digital and real worlds alike, and its most common form is the large language model (LLM), which, as the name suggests, produces coherent content by training on vast amounts of data – usually drawn from the internet – in a given language. The more data available for training, the better the model. It is easy to see, then, why English seems set to dominate the AI revolution, and why the race is on for those who want to safeguard a future for other languages to catch up.
This week, the position of the Arabic language got a boost with the roll-out of Jais, an open-source bilingual Arabic-English LLM developed in the UAE. Jais's developers – a team drawn from Abu Dhabi AI firm G42, Mohamed bin Zayed University of Artificial Intelligence and US tech firm Cerebras Systems – said their LLM is now the most accurate one available in Arabic.
The paper added, "Impressively, Jais can operate in multiple Arabic dialects – a skill that speakers of the language will know to be critical for widespread adoption and success."
Arabic is often referred to by linguists as a "macrolanguage", owing to the extreme variations across these dialects. Jais's developing ability to generate content across them, along with Modern Standard Arabic and English, could one day help to strengthen translation services, bolster the Arabic education sector and drive more digital adoption in the Arab world.
The greatest challenge for Jais, of course, is the limited online Arabic material on which to train. But Andrew Jackson, chief executive of the G42 unit involved in Jais, said overcoming this obstacle is a major focus of the team's work.
"We're spearheading an initiative to collect more Arabic data from offline sources," he told The National. "So this has already kicked off in earnest and this is the first method that we will employ to boost Arabic."
The Abu Dhabi-based daily concluded, "Developing an Arabic LLM to a level that bears all the promise of English-language counterparts like ChatGPT will be a monumental task. It is perhaps little wonder that Jais is named for the UAE's highest mountain. But if the summit of its potential can be reached, it could transform life in the Arab world and ensure that one of humanity's great ancient languages has a permanent place in its future."