Synthetic voices, such as Apple's Siri and Amazon's Alexa, have in the last couple of years, become increasingly sophisticated. Bloomberg
Synthetic voices, such as Apple's Siri and Amazon's Alexa, have in the last couple of years, become increasingly sophisticated. Bloomberg
Synthetic voices, such as Apple's Siri and Amazon's Alexa, have in the last couple of years, become increasingly sophisticated. Bloomberg
Synthetic voices, such as Apple's Siri and Amazon's Alexa, have in the last couple of years, become increasingly sophisticated. Bloomberg

Could human-like synthetic voices do more harm than good?


  • English
  • Arabic

It’s not easy to make a synthetic voice sound human. We’ve become very used to the dulcet tones of voice assistants such as Apple’s Siri and Amazon’s Alexa, but whether it’s because of their mispronunciations, awkward pauses or relentless cheeriness, we instinctively know the difference between these computerised approximations of a human voice and a real one.

But what if we couldn’t tell the difference? Synthetic voices have, in the last couple of years, become increasingly sophisticated. Deep learning techniques have given algorithms a much better handle on the way human beings speak, and they can now instruct synthetic voices to express themselves with ever greater nuance.

Earlier this month, Reuters reported on how a start-up in Los Angeles had constructed a synthetic voice avatar from the voice of a local DJ, Andy Chanley. That “robot DJ” version of Andy can now deliver written lines in a way that’s hard to distinguish from the real thing. Chanley himself, having spent three decades broadcasting, is delighted that his voice will live on, and it’s clear that across the fields of entertainment, broadcasting and marketing, synthetic voices will become normal.

What perhaps couldn’t have been predicted, however, is our attachment to human voices, and how computerised voices can disconcert us when they masquerade as the real thing.

So suddenly we feel vulnerable to being manipulated, but we also feel vulnerable because it means we can’t trust our own judgement [when hearing these voices]. We need to be able to trust our ears
David Polgar,
technology ethicist

“Voice is so personal, so human,” says Jon Stine of the Open Voice Network, an organisation developing ethical standards for voice technology. “It's biometric. It identifies us uniquely.

"It can be used to infer our age, our health, educational level, ethnicity. When friends say hello, we know who they are! It's a very precious element in our life, and we must treat it with the respect it deserves.”

Earlier this year, that respect was deemed to have been sullied by filmmaker Morgan Neville, when he admitted to using AI to reconstruct the voice of the chef Anthony Bourdain for use in a documentary about his life and death. This fact wasn’t disclosed in the film, and those watching would never have known had it not been admitted later. But when it was, it provoked a fierce debate about ethics.

“In my opinion, people reacted to it in an adverse fashion because it almost feels offensive to that individual's lack of ability to control their persona,” says technology ethicist David Polgar. “Is this something he would have wanted? So suddenly we feel vulnerable to being manipulated, but we also feel vulnerable because it means we can’t trust our own judgement [when hearing these voices]. We need to be able to trust our ears.”

In 2019, Dave Limp, senior vice president for Amazon devices and services, introduced new celebrity voices, including actor Samuel L Jackson, whose voice has been made available within the tech company's Alexa devices. AP
In 2019, Dave Limp, senior vice president for Amazon devices and services, introduced new celebrity voices, including actor Samuel L Jackson, whose voice has been made available within the tech company's Alexa devices. AP

Nevertheless, the global text-to-speech market, worth around $2 billion last year, is projected to grow threefold by 2028. It’s largely driven by consumer demand for content, and the difficulty of meeting demand because of the limits of traditional ways of working.

Voices synthesised from celebrities could be used globally, in any context, without the celebrity having to personally record those messages. Dubbing could easily be fixed in movies. Actors and voiceover artists could have their voices localised with different accents, even different languages. The worlds of advertising, education, virtual reality and even health could see significant benefits.

And yet the technology has a way to go. There are still inherent difficulties in creating a synthetic voice that doesn’t prompt the “uncanny valley” effect, where the listener has the sense that something isn’t quite right. That’s because the written word and the spoken word are very different things.

“When you use text-to-speech, AI needs to guess how to say it,” says Alex Serdiuk, chief executive of Respeecher, a synthetic voice company in Ukraine. “And this AI is extremely limited to what emotions it can guess. Also, speech doesn’t just consist of words. Whispering, or singing, sighing or screaming – these things cannot be converted using text-to-speech in any way, and they’re very important parts of our speech.”

Respeecher’s elegant solution to the problem is using what it calls “speech-to-speech” technology, where one person’s voice, complete with all its nuances, is transformed into that of another. The technology was recently used in The Mandalorian, a Star Wars spin-off series, to provide a voice for the young Luke Skywalker. No one knew until they were told later.

As with any form of AI, these innovations can be put to nefarious uses. “It often takes the general public a longer while to fully recognise a problem when it's already been incorporated into society – and that's a problem,” says Polgar.

Artificial voices have already been used to perpetrate telephone scams, where people are persuaded to part with money in the belief that they’re speaking to someone they trust. As the quality of these voices improves, our susceptibility to these scams will increase.

Voice actors and radio broadcasters have become concerned for their livelihoods, and as Polgar notes, such voices can make the public feel vulnerable. Organisations such as Open Voice Network are busy constructing ethical frameworks for the technology, but what of those who simply don’t adhere to them?

“In most countries, it's just not legal to use someone's intellectual property – ie their likeness, their voice – to produce something without their consent,” says Serdiuk. “But the first and most important goal is to educate societies that these technologies exist, that they will fall into the wrong hands, and will be misused. So we should start treating this information differently.”

Januzaj's club record

Manchester United 50 appearances, 5 goals

Borussia Dortmund (loan) 6 appearances, 0 goals

Sunderland (loan) 25 appearances, 0 goals

The rules of the road keeping cyclists safe

Cyclists must wear a helmet, arm and knee pads

Have a white front-light and a back red-light on their bike

They must place a number plate with reflective light to the back of the bike to alert road-users

Avoid carrying weights that could cause the bike to lose balance

They must cycle on designated lanes and areas and ride safe on pavements to avoid bumping into pedestrians

if you go

The flights
Flydubai offers three daily direct flights to Sarajevo and, from June, a daily flight from Thessaloniki from Dubai. A return flight costs from Dhs1,905 including taxes.
The trip 
The Travel Scientists are the organisers of the Balkan Ride and several other rallies around the world. The 2018 running of this particular adventure will take place from August 3-11, once again starting in Sarajevo and ending a week later in Thessaloniki. If you’re driving your own vehicle, then entry start from €880 (Dhs 3,900) per person including all accommodation along the route. Contact the Travel Scientists if you wish to hire one of their vehicles. 

PROFILE OF CURE.FIT

Started: July 2016

Founders: Mukesh Bansal and Ankit Nagori

Based: Bangalore, India

Sector: Health & wellness

Size: 500 employees

Investment: $250 million

Investors: Accel, Oaktree Capital (US); Chiratae Ventures, Epiq Capital, Innoven Capital, Kalaari Capital, Kotak Mahindra Bank, Piramal Group’s Anand Piramal, Pratithi Investment Trust, Ratan Tata (India); and Unilever Ventures (Unilever’s global venture capital arm)

Last 10 NBA champions

2017: Golden State bt Cleveland 4-1
2016: Cleveland bt Golden State 4-3
2015: Golden State bt Cleveland 4-2
2014: San Antonio bt Miami 4-1
2013: Miami bt San Antonio 4-3
2012: Miami bt Oklahoma City 4-1
2011: Dallas bt Miami 4-2
2010: Los Angeles Lakers bt Boston 4-3
2009: Los Angeles Lakers bt Orlando 4-1
2008: Boston bt Los Angeles Lakers 4-2

Updated: December 15, 2021, 5:11 AM