Speech recognition systems leave out a large demographic of English speakers because they can only recognise accents they’ve been trained to understand. Getty
Speech recognition systems leave out a large demographic of English speakers because they can only recognise accents they’ve been trained to understand. Getty
Speech recognition systems leave out a large demographic of English speakers because they can only recognise accents they’ve been trained to understand. Getty
Speech recognition systems leave out a large demographic of English speakers because they can only recognise accents they’ve been trained to understand. Getty

Accents and AI: how speech recognition software could lead to new forms of discrimination


  • English
  • Arabic

Anyone who has used a voice assistant such as Apple's Siri or Amazon's Alexa will have occasionally struggled to make themselves understood. Perhaps the device plays the wrong music, or puts unusual items on a shopping list, or emits a plaintive “didn't quite catch that”. But for people who speak with an accent, these devices can be unusable.

The inability of speech recognition systems to understand accents found in Scotland, Turkey, the southern states of the US or any number of other places is widely documented on social media, and yet the problem persists. With uses of the technology now spreading beyond the domestic, researchers and academics are warning that biased systems could lead to new forms of discrimination, purely because of someone’s accent.

“It's one of the questions that you don't see big tech responding to,” says Halcyon Lawrence a professor of technical communication at Towson University in Maryland who is from Trinidad and Tobago. “There's never a statement put out. There's never a plan that's articulated. And that's because it's not a problem for big tech. But it’s a problem for me, and large groups of people like me.”

Speech recognition systems can only recognise accents they’ve been trained to understand. To learn how to interpret the accent of someone from Trinidad, Eswatini or the UAE, a system needs voice data, along with an accurate transcription of that data, which inevitably has to be done by a human being. It’s a painstaking and expensive process to demonstrate to a machine what a particular word sounds like when it’s spoken by a particular community, and perhaps inevitably, existing data is heavily skewed towards English as typically spoken by white, highly educated Americans.

If you plot new accent releases on a map, you can’t help but notice that the Global South is not a consideration, despite the numbers of English speakers there
Halcyon Lawrence,
a professor of Technical Communication at Towson University in Maryland

A study called Racial Disparities in Automated Speech Recognition, published last year by researchers at Stanford University, illustrates the stark nature of the problem. It analysed systems developed by Amazon, Apple, Google, IBM and Microsoft, and found that in every case the error rates for black speakers were nearly double that of white people. In addition, it found that the errors were not caused by grammar, but by “phonological, phonetic, or prosodic characteristics”; in other words, accent.

Allison Koenecke, who led the study, believes that a two-fold improvement in the system is needed. “It needs resources to ethically collect data and ensure that the people working on these products are also diverse,” she says. “While tech companies may have the funds, they may not have known that they needed to prioritise this issue before external researchers shone a light on it.”

Lawrence, however, believes that the failings are no accident.

“What, for me, shows big tech's intention is when they decide to release a new accent to the market and where that is targeted,” she says. “If you plot it on a map, you can’t help but notice that the Global South is not a consideration, despite the numbers of English speakers there. So you begin to see that this is an economic decision.”

It’s not only accented English that scupper speech recognition systems. Arabic poses a particular challenge – not simply because of the many sub-dialects, but inherent difficulties such as the lack of capital letters, recognising proper nouns and predicting a word’s vowels based on context. Substantial resources are being thrown at this problem, but the current situation is the same as with English: large communities technologically disenfranchised.

Why is this of particular concern? Beyond the world of smart speakers lies a much bigger picture. “There are many higher-stakes applications with much worse consequences if the underlying technologies are biased,” says Koenecke. “One example is court transcriptions, where court reporters are starting to use automatic speech recognition technologies. If they aren't accurate at transcribing cases, you have obvious repercussions.”

Lawrence is particularly concerned about the way people drop their accent in order to be understood, rather than the technology working harder to understand them. “Accent bias is already practiced in our community,” she says. “There's an expectation that we adapt our accent, and that's what gets replicated in the device. It would not be an acceptable demand on somebody to change the colour of their skin, so why is it acceptable to demand we change our accents?”

Money, as ever, lies at the root of the problem. Lawrence believes strongly that the market can offer no solution, and that big tech has to be urged to look beyond its profit margin. “It’s one of the reasons why I believe that we’re going to see more and more smaller independent developers do this kind of work,” she says.

One of those developers, a British company called Speechmatics, is at the forefront, using what it calls “self-supervised learning” to introduce its speech recognition systems to a new world of voices.

If you have the right kind of diversity of data, it will learn to generalise across voices, latch on quickly and understand what's going on
Will Williams,
vice president of Machine Learning

“We're training on over a million hours of unlabelled audio, and constructing systems that can learn interesting things, autonomously run,” says Will Williams, vice president of machine learning at Speechmatics.

The crucial point: this is voice data that hasn’t been transcribed. “If you have the right kind of diversity of data, it will learn to generalise across voices, latch on quickly and understand what's going on.” Using datasets from the Stanford study, Speechmatics has already reported a 45 per cent reduction in errors when using its system.

An organisation called ML Commons, which has Google and Microsoft as two of its more than 50 founding members, is now looking for new ways to create speech recognition systems that are accent-agnostic.

It’s a long road ahead, but Koenecke is optimistic. “Hopefully, as different speech-to-text companies decide to invest in more diverse data and more diverse teams of employees such as engineers and product managers, we will see something that reflects more closely what we see in real life.”

GAC GS8 Specs

Engine: 2.0-litre 4cyl turbo

Power: 248hp at 5,200rpm

Torque: 400Nm at 1,750-4,000rpm

Transmission: 8-speed auto

Fuel consumption: 9.1L/100km

On sale: Now

Price: From Dh149,900

Zakat definitions

Zakat: an Arabic word meaning ‘to cleanse’ or ‘purification’.

Nisab: the minimum amount that a Muslim must have before being obliged to pay zakat. Traditionally, the nisab threshold was 87.48 grams of gold, or 612.36 grams of silver. The monetary value of the nisab therefore varies by current prices and currencies.

Zakat Al Mal: the ‘cleansing’ of wealth, as one of the five pillars of Islam; a spiritual duty for all Muslims meeting the ‘nisab’ wealth criteria in a lunar year, to pay 2.5 per cent of their wealth in alms to the deserving and needy.

Zakat Al Fitr: a donation to charity given during Ramadan, before Eid Al Fitr, in the form of food. Every adult Muslim who possesses food in excess of the needs of themselves and their family must pay two qadahs (an old measure just over 2 kilograms) of flour, wheat, barley or rice from each person in a household, as a minimum.

What is an ETF?

An exchange traded fund is a type of investment fund that can be traded quickly and easily, just like stocks and shares. They come with no upfront costs aside from your brokerage's dealing charges and annual fees, which are far lower than on traditional mutual investment funds. Charges are as low as 0.03 per cent on one of the very cheapest (and most popular), Vanguard S&P 500 ETF, with the maximum around 0.75 per cent.

There is no fund manager deciding which stocks and other assets to invest in, instead they passively track their chosen index, country, region or commodity, regardless of whether it goes up or down.

The first ETF was launched as recently as 1993, but the sector boasted $5.78 billion in assets under management at the end of September as inflows hit record highs, according to the latest figures from ETFGI, a leading independent research and consultancy firm.

There are thousands to choose from, with the five largest providers BlackRock’s iShares, Vanguard, State Street Global Advisers, Deutsche Bank X-trackers and Invesco PowerShares.

While the best-known track major indices such as MSCI World, the S&P 500 and FTSE 100, you can also invest in specific countries or regions, large, medium or small companies, government bonds, gold, crude oil, cocoa, water, carbon, cattle, corn futures, currency shifts or even a stock market crash. 

'Skin'

Dir: Guy Nattiv

Starring: Jamie Bell, Danielle McDonald, Bill Camp, Vera Farmiga

Rating: 3.5/5 stars

Abramovich London

A Kensington Palace Gardens house with 15 bedrooms is valued at more than £150 million.

A three-storey penthouse at Chelsea Waterfront bought for £22 million.

Steel company Evraz drops more than 10 per cent in trading after UK officials said it was potentially supplying the Russian military.

Sale of Chelsea Football Club is now impossible.

Profile

Company: Justmop.com

Date started: December 2015

Founders: Kerem Kuyucu and Cagatay Ozcan

Sector: Technology and home services

Based: Jumeirah Lake Towers, Dubai

Size: 55 employees and 100,000 cleaning requests a month

Funding:  The company’s investors include Collective Spark, Faith Capital Holding, Oak Capital, VentureFriends, and 500 Startups. 

The rules on fostering in the UAE

A foster couple or family must:

  • be Muslim, Emirati and be residing in the UAE
  • not be younger than 25 years old
  • not have been convicted of offences or crimes involving moral turpitude
  • be free of infectious diseases or psychological and mental disorders
  • have the ability to support its members and the foster child financially
  • undertake to treat and raise the child in a proper manner and take care of his or her health and well-being
  • A single, divorced or widowed Muslim Emirati female, residing in the UAE may apply to foster a child if she is at least 30 years old and able to support the child financially
Iftar programme at the Sheikh Mohammed Centre for Cultural Understanding

Established in 1998, the Sheikh Mohammed Centre for Cultural Understanding was created with a vision to teach residents about the traditions and customs of the UAE. Its motto is ‘open doors, open minds’. All year-round, visitors can sign up for a traditional Emirati breakfast, lunch or dinner meal, as well as a range of walking tours, including ones to sites such as the Jumeirah Mosque or Al Fahidi Historical Neighbourhood.

Every year during Ramadan, an iftar programme is rolled out. This allows guests to break their fast with the centre’s presenters, visit a nearby mosque and observe their guides while they pray. These events last for about two hours and are open to the public, or can be booked for a private event.

Until the end of Ramadan, the iftar events take place from 7pm until 9pm, from Saturday to Thursday. Advanced booking is required.

For more details, email openminds@cultures.ae or visit www.cultures.ae

 

Stormy seas

Weather warnings show that Storm Eunice is soon to make landfall. The videographer and I are scrambling to return to the other side of the Channel before it does. As we race to the port of Calais, I see miles of wire fencing topped with barbed wire all around it, a silent ‘Keep Out’ sign for those who, unlike us, aren’t lucky enough to have the right to move freely and safely across borders.

We set sail on a giant ferry whose length dwarfs the dinghies migrants use by nearly a 100 times. Despite the windy rain lashing at the portholes, we arrive safely in Dover; grateful but acutely aware of the miserable conditions the people we’ve left behind are in and of the privilege of choice. 

Updated: November 07, 2021, 2:54 PM