A US Marine uses the Jibbigo voice-to-voice translation application to quickly and accurately translate specific dialects of Arabic. Courtesy Jibbigo
A US Marine uses the Jibbigo voice-to-voice translation application to quickly and accurately translate specific dialects of Arabic. Courtesy Jibbigo

Inside the design of the US Army's Arabic translation app, Jibbigo



Researchers at Carnegie Mellon University were first approached by the United States government to develop automatic translation software for military personnel deployed in Iraq, but they were afraid it was too delicate a task.

It was not just that they were being asked to devise a way of providing rapid, smooth voice-to-voice translation in a combat environment, where an error or poor interpretation could be fatal for those using it.

Nor was it even that they would need to do this intensive task without access to grid electricity or the internet.

The trickiest part, they soon realised, was that their system would have to cope with an Arabic dialect so different from Modern Standard Arabic (MSA) that even the popular greeting, "shlonak" – or "what is your colour" – could be misinterpreted.

They set out at first to build the application from an existing pool of Modern Standard Arabic terms.

The program would use two databases: one of recorded dialogue and its text equivalent, and another that matched those texts with their translations. The developers added a sprinkling of Iraqi vocabulary and pronunciations, but the model had far too many blemishes.

"The speech recognition was of no help for an Iraqi speaker," said Alex Waibel, who led the project.

"In those situations, miscommunications or misunderstandings can potentially be lethal when all parties are well-meaning. How does a US soldier know when someone is looking for his little child or has ill intentions?

"At checkpoints that is always a dangerous situation, and problems arise because people simply do not understand one another."

The problem of making machines that can translate Arabic has long been recognised. The language is so widely spoken by so many people – more than 200 million of them, from Morocco to Oman – that extremely distinct dialects have developed. Arabian Gulf Arabs often have difficulty understanding Iraqis, Iraqis have trouble communicating with Egyptians and almost nobody understands Moroccans.

The variations in word choice, sentence structure and phonology – how sounds are spoken – are big enough that many linguists consider Arabic as a cluster of separate languages, as different to each other as German to Dutch or Spanish to French.

That makes voice recognition machines based on MSA, which is taught in schools and used by government, academia and the media,  all but useless.

So Jibbigo, the app developed by Dr Waibel's team at Carnegie Mellon, has been fed a database of Iraqi Arabic collected and translated by soldiers on the ground over the course of the war, giving it a 40,000-word vocabulary for voice-to-voice translation to English and voice-to-text translation to more than a dozen other languages.

Though Iraqi is its only Arabic dialect so far, the team is beginning to collect samples of Algerian Maghribi Arabic.

Building that kind of database from scratch is arduous, requiring hundreds of thousands of hours of speech to be collected and processed. Google alone deals with more than 1,000 hours of spoken Arabic,  about half of it from the Gulf, each day.

Instead, researchers are increasingly looking to crowdsourcing, online and on the ground. Jibbigo pays users to correct their mother-tongue translations, and hires people locally to collect audio.

Google, meanwhile, relies heavily on comparisons of audio tracks of everything from news reports to user-created content on YouTube. It also compares sentences spoken into its software by participating Android mobile users, which has helped to reduce errors by more than 20 per cent.

And there is a huge new resource for this crowdsourcing, thanks to the stunning rise of the internet since the start of the Arab Spring. Blogs and social media now represent a rich pool of written and spoken dialects.

Twitter has put to work thousands of volunteers translating and localising tweets in real time. Microsoft, meanwhile, has developed translation tools to work in tandem with Wikipedia and the search engine Bing, in collaboration with users.

"Localisation and customising for individual countries is key," says Hussein Salama, the director of a Microsoft research centre in Cairo. "We need to look at how to provide Arabic speakers with better tools to communicate, because right now they have a lot to say but are underserved."

The biggest challenge in correlating speech with written text for translation is diacritics, the dots and dashes that represent short vowels – or the lack thereof – in formal Arabic text.

But while they represent one of the biggest differences between the region's dialects, they are often missing from written Arabic.

Meanwhile, dissecting online text is layered in other complications, as words are often muddied with numbers and symbols, or mix Arabic and Latin text, to account for sounds that do not transliterate easily.

That leaves researchers looking for a formula that takes all of those factors into account, building modules that detect patterns of words and intonation to discern context and, in turn, diacritics and dialect. A module could, for instance, recognise a higher frequency of accented key words in Egyptian Arabic compared with that of other countries.

At Google, speech recognition experts have broken Arabic down to four major dialects: Egyptian; other North African; Arabian Gulf; and Levantine, the dialect spoken by Syrians, Lebanese and Jordanians. Iraqi is sometimes grouped with the Gulf dialect, although those involved in the project agree it is unique enough to warrant its own category.

And they have reached the same conclusion as their counterparts at Carnegie Mellon: the various dialects are so acoustically different that a single voice recognition system cannot work for them all. Each needs its own recognition and translation system.

"Because the dialectical forms are in some cases very extreme, you can have a person from Morocco not understanding someone from the Gulf," said Pedro Moreno, a New York-based senior researcher leading the speech engineering group for Google's Android division.

"Rough sounds such as the 'ga' in Egyptian do not exist in Lebanese, while the 'ja' sound does not exist in Cairo dialect but does in nearly every other dialect.

"When mapping the phonetic structure of a word and the acoustic distribution of each phoneme, it is just too diverse."

Recognising those differences in phonetic habits can also help in identifying a speaker's regional dialect, even when they are speaking Modern Standard Arabic, classical Arabic, or even English.

"In Egyptian Arabic, almost every word is accented, so when they speak English, you can tell they are Egyptian," said Fadi Biadsy, another senior researcher on Google's team in New York.

It is an important tool to have. Google's module for Gulf Arabic contains at least 60,000 regularly-used English words.

"Arabic speakers tend to change between dialects and languages very freely, and we are forced to model that. In North Africa, there is a lot of code-switching, which has caused some delay there."

There is a long road ahead in perfecting databases of dialectical speech, researchers agree. And for some, time may be running out.

"In remote provinces where there is not a local paper or published form of the language they speak, or if the local people do not know how to read and write, we literally have to establish a written language for them to build a dialect system," said Dr Waibel.

"That's obviously a laborious effort, but without it, their dialect could die out. And without it, people in the Arabic world cannot communicate with one another, and that really is a shame."

newsdesk@thenational.ae

Other promotions
  • Deliveroo will team up with Pineapple Express to offer customers near JLT a special treat: free banana caramel dessert with all orders on January 26
  • Jones the Grocer will have their limited edition Australia Day menu available until the end of the month (January 31)
  • Australian Vet in Abu Dhabi (with locations in Khalifa City A and Reem Island) will have a 15 per cent off all store items (excluding medications) 
Key facilities
  • Olympic-size swimming pool with a split bulkhead for multi-use configurations, including water polo and 50m/25m training lanes
  • Premier League-standard football pitch
  • 400m Olympic running track
  • NBA-spec basketball court with auditorium
  • 600-seat auditorium
  • Spaces for historical and cultural exploration
  • An elevated football field that doubles as a helipad
  • Specialist robotics and science laboratories
  • AR and VR-enabled learning centres
  • Disruption Lab and Research Centre for developing entrepreneurial skills
Company%20profile
%3Cp%3E%3Cstrong%3EName%3A%20%3C%2Fstrong%3EMaly%20Tech%3Cbr%3E%3Cstrong%3EStarted%3A%3C%2Fstrong%3E%202023%3Cbr%3E%3Cstrong%3EFounder%3A%3C%2Fstrong%3E%20Mo%20Ibrahim%3Cbr%3E%3Cstrong%3EBased%3A%3C%2Fstrong%3E%20Dubai%20International%20Financial%20Centre%3Cbr%3E%3Cstrong%3ESector%3A%3C%2Fstrong%3E%20FinTech%3Cbr%3E%3Cstrong%3EFunds%20raised%3A%3C%2Fstrong%3E%20%241.6%20million%3Cbr%3E%3Cstrong%3ECurrent%20number%20of%20staff%3A%3C%2Fstrong%3E%2015%3Cbr%3E%3Cstrong%3EInvestment%20stage%3A%20%3C%2Fstrong%3EPre-seed%2C%20planning%20first%20seed%20round%3Cbr%3E%3Cstrong%3EInvestors%3A%3C%2Fstrong%3E%20GCC-based%20angel%20investors%3C%2Fp%3E%0A
COMPANY%20PROFILE%20
%3Cp%3E%3Cstrong%3ECompany%20name%3A%20%3C%2Fstrong%3EAlmouneer%3Cbr%3E%3Cstrong%3EStarted%3A%3C%2Fstrong%3E%202017%3Cbr%3E%3Cstrong%3EFounders%3A%3C%2Fstrong%3E%20Dr%20Noha%20Khater%20and%20Rania%20Kadry%3Cbr%3E%3Cstrong%3EBased%3A%20%3C%2Fstrong%3EEgypt%3Cbr%3E%3Cstrong%3ENumber%20of%20staff%3A%20%3C%2Fstrong%3E120%3Cbr%3E%3Cstrong%3EInvestment%3A%20%3C%2Fstrong%3EBootstrapped%2C%20with%20support%20from%20Insead%20and%20Egyptian%20government%2C%20seed%20round%20of%20%3Cbr%3E%243.6%20million%20led%20by%20Global%20Ventures%3Cbr%3E%3C%2Fp%3E%0A
Al Jazira's foreign quartet for 2017/18

Romarinho, Brazil

Lassana Diarra, France

Sardor Rashidov, Uzbekistan

Mbark Boussoufa, Morocco

UAE v Gibraltar

What: International friendly

When: 7pm kick off

Where: Rugby Park, Dubai Sports City

Admission: Free

Online: The match will be broadcast live on Dubai Exiles’ Facebook page

UAE squad: Lucas Waddington (Dubai Exiles), Gio Fourie (Exiles), Craig Nutt (Abu Dhabi Harlequins), Phil Brady (Harlequins), Daniel Perry (Dubai Hurricanes), Esekaia Dranibota (Harlequins), Matt Mills (Exiles), Jaen Botes (Exiles), Kristian Stinson (Exiles), Murray Reason (Abu Dhabi Saracens), Dave Knight (Hurricanes), Ross Samson (Jebel Ali Dragons), DuRandt Gerber (Exiles), Saki Naisau (Dragons), Andrew Powell (Hurricanes), Emosi Vacanau (Harlequins), Niko Volavola (Dragons), Matt Richards (Dragons), Luke Stevenson (Harlequins), Josh Ives (Dubai Sports City Eagles), Sean Stevens (Saracens), Thinus Steyn (Exiles)

Specs

Engine: 51.5kW electric motor

Range: 400km

Power: 134bhp

Torque: 175Nm

Price: From Dh98,800

Available: Now

Results

7pm: Wathba Stallions Cup – Handicap (PA) Dh70,000 (Dirt) 1,600m; Winner: RB Kings Bay, Abdul Aziz Al Balushi (jockey), Helal Al Alawi (trainer)

7.30pm: Maiden (PA) Dh 70,000 (D) 1,600m; Winner: AF Ensito, Fernando Jara, Mohamed Daggash

8pm: Maiden (PA) Dh70,000 (D) 1,400m; Winner: AF Sourouh, Tadhg O’Shea, Ernst Oertel

8.30pm: Maiden (PA) Dh70,000 (D) 1,800m; Winner: Baaher, Fabrice Veron, Eric Lemartinel

9pm: Maiden (PA) Dh70,000 (D) 2,000m; Winner: Mootahady, Antonio Fresu, Eric Lemartinel

9.30pm: Handicap (TB) Dh70,000 (D) 2,000m; Winner: Dubai Canal, Tadhg O’Shea, Satish Seemar

10pm: Al Ain Cup – Prestige (PA) Dh100,000 (D) 2,000m; Winner: Harrab, Bernardo Pinheiro, Majed Al Jahouri

From Europe to the Middle East, economic success brings wealth - and lifestyle diseases

A rise in obesity figures and the need for more public spending is a familiar trend in the developing world as western lifestyles are adopted.

One in five deaths around the world is now caused by bad diet, with obesity the fastest growing global risk. A high body mass index is also the top cause of metabolic diseases relating to death and disability in Kuwait,  Qatar and Oman – and second on the list in Bahrain.

In Britain, heart disease, lung cancer and Alzheimer’s remain among the leading causes of death, and people there are spending more time suffering from health problems.

The UK is expected to spend $421.4 billion on healthcare by 2040, up from $239.3 billion in 2014.

And development assistance for health is talking about the financial aid given to governments to support social, environmental development of developing countries.

 

The specs
 
Engine: 3.0-litre six-cylinder turbo
Power: 398hp from 5,250rpm
Torque: 580Nm at 1,900-4,800rpm
Transmission: Eight-speed auto
Fuel economy, combined: 6.5L/100km
On sale: December
Price: From Dh330,000 (estimate)
A MINECRAFT MOVIE

Director: Jared Hess

Starring: Jack Black, Jennifer Coolidge, Jason Momoa

Rating: 3/5