Arabic users trick algorithms to keep Palestinian content online
Language games used to bypass screening and restrictions on social media
Social media users posting about Palestine are fighting back against a wave of online restrictions by transforming the Arabic language to bypass algorithmic detection.
Almost all languages can be manipulated in some way, such as using slang to confuse an untrained ear, or a poorly trained algorithm.
A group of five tech-savvy friends launched a website this week, Tajawz, which automates this process by encoding Arabic text to confuse algorithms.
“It makes it readable by humans, but at the same time, makes it very hard for the algorithm to read or translate it,” said Harith, one of the founders of the website.
Since the site’s launch on May 17, it has had nearly one million visits.
Social media algorithms use artificial intelligence and machine learning to scan for certain words or phrases flagged by the platforms as impermissible. Tajawz encodes Arabic characters into new unrecognisable words, like taking Arabic and turning it into written drawings, essentially breaking the process used by algorithms to flag and remove content.
“We’re trying to prevent people from getting automatically reported or automatically blocked from using the platform,” Harith told The National.
“Social networks enhanced our ability to express ourselves and share information freely. But recently, all of this started to fall apart with this new wave of integrating AI and machine learning, and what they call natural language processing,” he said.
“We're being fought by the algorithms, which were supposed to help us.”
Arabic social media users have played language games with social media platforms for years, but the technique gained traction in recent days when platforms were accused of large-scale takedowns of Palestinian content.
“Arabic is a perfect language, actually, to be a secret language,” Wafaa Heikal, a social media analyst, told The National.
It offers a variety of ways to manipulate the language to confuse algorithms.
Innovative users will write in dotless Arabic or play with the position of the dots; mix Arabic and English letters; add one word to the end of each word; remove a single letter, or change the order of letters in a word.
“Algorithms don’t have imagination. Human beings have imagination,” Ms Heikal said.
For algorithms these words “will be cryptic, they are not going to understand what we are saying, but we are going to understand each other”, she said.
But this cryptic Arabic is more difficult to read and write, said Mona Elswah, a researcher at the Oxford Internet Institute.
“It's not a sustainable language to be used. It's a language of revolt against platform algorithms. It's a language of rebellion.”
She said it is a technique to show platforms that users can fight back, but it is not a long-term solution.
For that, platforms need to address the multitude of errors they have claimed in recent weeks when moderating content about Palestine.
Since the escalation of violence between Israel and Palestine, digital researchers at the Arab Centre for the Advancement of Social Media, known as 7amleh, have tracked more than 500 instances of digital rights offences.
They found content and accounts were removed, reduced and restricted across most major platforms, with 50 per cent of the incidents happening on Instagram, 35 per cent on Facebook, 11 per cent on Twitter and 1 per cent on Tik Tok.
Despite platforms admitting that errors were made, mass removals continue to occur, signalling a deeper and more systemic issue when it comes to Arabic content.
Social media platforms in the Middle East are heavily criticised for how they determine which words are permissible and which are flagged for removal.
Facebook confirmed to The National that it had restricted hashtags for Al Aqsa Mosque just as Israeli forces were storming Islam’s third holiest site. Internal documents obtained by Buzzfeed News later showed that hashtags about the Jerusalem mosque were blocked because an extremist entity shares the name Al Aqsa.
“The takedowns are on a scale we have never seen before, even in other countries like Syria; we have never seen such a scale, it's now so fast and so wide,” Ms Elswah told The National.
“The excuse has always been that they don't have the capacity for Arabic. But this doesn't make sense. Arabic is the fourth most common language on the internet,” she said.
Some of the earliest instances of these algorithm tricks being used can be traced back to the 2011 Arab uprisings.
During this period, social media became a vital and widespread tool for activists to communicate as they planned to overthrow regimes across the region.
A decade ago, the platforms offered more freedom because algorithms were not as advanced as they are today.
“One of the main gains of the uprising was the freedom of speech within corporate social media to express our ideas, because we were not able to do it in the real world,” Ms Heikal said.
“But this narrative they created for us, they are now taking it back by saying 'we don't want to show your content'."
“They want us to die in the dark.”
Updated: May 21, 2021 08:43 PM