Are big tech companies actually listening in on us via our phones or smart speakers?

With big tech companies under fire for privacy concerns over voice recordings, should we be worried?

Google Home mini assistant, exhibited during the Mobile World Congress, on February 27, 2019 in Barcelona, Spain. Getty Images
 (Photo by Joan Cros/NurPhoto via Getty Images)
Powered by automated translation

Many people still believe the conspiracy theory that big technology companies are using our phones and computers to constantly listen to us, despite security researchers having debunked the ­largely anecdotal evidence. Sometimes, however, devices are listening.

Whenever our voices are transmitted over the internet, via messages, voice calls or commands to virtual assistants, there is a possibility that recordings will be stored for the purposes of "improving the service".

Amazon, Google and Apple have all come under fire of late for being less than candid about the way those service improvements are made. Typically, it will involve a freelance contractor reviewing a voice recording and annotating it. The person whose voice it is would never know and the contractor isn't told who it is either, but it feels invasive nonetheless. Now Microsoft has joined the list, as questions are asked not only about the security of voice messages sent to its Cortana virtual assistant, but also voice calls that are made via its Skype service.

Here's what is actually happening

There's a common assumption that technological magic such as instant translation and clever virtual assistants happen within black boxes where no human could intervene. But while computers are good at processing large quantities of voice data, humans are still needed to ensure that computers understand us correctly.

Google belatedly attempted to explain this in a blog post last month. “As part of our work to develop speech technology, we partner with language experts … who understand the nuances and accents of a specific language,” it read. “These experts review and transcribe a small set of queries to help us better understand those languages. This is a critical part of the process of building speech technology.”

The people employed to review voice recordings might work on as many as 1,000 clips in one shift, but over the course of that shift they may find themselves listening to material that makes them feel uncomfortable, such as arguments, romantic liaisons, children crying or bad language. Although companies are careful not to give contractors information about the source of the recordings, they might still hear full names, addresses or other identifiable material. Some contractors have felt so uncomfortable with their officially sanctioned eavesdropping that they've gone public with two major concerns: firstly, that material is being recorded that shouldn't be and, secondly, that people aren't aware of what is happening.

While people might be aware on some level that information is being sent to Amazon or Google when you're interacting with a smart speaker, it's not on the top of your mind. There aren't many cues to help you understand that information is stored – or even that there's a need for information to be stored.

The majority of the commands we issue to smart speakers are innocuous, perhaps asking to play a particular radio station or set an alarm. But whistleblowers have expressed concern that devices are waking up randomly and processing accidental recordings that should never have been made.

These are known as "false accepts" and data collected by one Google whistleblower shows that about 15 per cent of recordings occur by accident. This error rate indicates that smart speakers – essentially microphones in our homes – may not be as benign as we've been led to believe. "These technologies live in your home, they become part of your daily fabric," says Schaub. "When you phone a call centre, you typically get a message explaining that the call may be recorded for training purposes, but that explicit acknowledgment is missing from smart speakers."

So who is actually listening to your recordings? 

Why have the big tech companies failed to make it clear that recordings might be analysed by their employees? Their terms of service may mention how voice recordings will be used to help improve speech recognition, but there's been no mention of human involvement until recently. "It requires a technical understanding [of service agreements] to realise that people will be annotating recordings," says Schaub. "But in the past week Amazon have changed theirs. They now say that a small set of recordings might be annotated, but that they protect your privacy in doing so. But how? What are they actually doing to anonymise these recordings?"

Some people may dismiss this outcry as excessive and disproportionate. After all, the "narcissistic fallacy" dictates that we believe people to be more interested in us than they actually are and anonymised voice clips being used as a data set for training could be considered less invasive than, say, a telephone banking assistant looking at your account while helping you with a query.

But the sheer quantity of data generated by modern devices such as smart speakers can lead to cross-referencing that can compromise privacy. Some Amazon employees have been reported as having access to latitude and longitude co-­ordinates for voice recordings that could be easily linked to a specific home. And if employees do hear sensitive data, the onus is on them not to act on it. Their contracts require them not to, but consumers are being asked to have trust in a system that has demonstrated itself to be somewhat opaque.

The companies that are being criticised have issued apologies, stressing how they take privacy and security seriously. Apple and Google have both "paused" the manual review of voice recordings, while Amazon has given customers the opportunity to opt out, if they can find the setting buried deep in an app menu.

"What it needs is better user experience design," says Schaub. "They should ask users how they think the technology works to uncover mismatches between people's ideas and what's actually happening in the background. It wouldn't be too hard for a smart speaker to prompt the user, saying 'have you noticed how much better I understand you recently? Do you want to learn how we improve our speech recognition and protect your privacy?' That would go a long way to reduce the outrage that people feel and help companies to build some trust."