Is the world ready for the era of 'visual search'?

Tech companies are refining their tools for facial recognition and visual search, but what about the ethical concerns that come with it?

Matt Vokoun, Director of Product Management at Google, Inc., introduces Google Lens at a product launch event on October 4, 2017 at the SFJAZZ Center in San Francisco, California. (Photo by Elijah Nouvelage / AFP)
Powered by automated translation

The establishment of "to Google" as an everyday verb shows how deeply search engines have become entwined in our lives. Sometimes, however, we can't find the words we need to search for something. We might see a plant, tree or flower and wonder what it's called, but what do we type in? A mysterious building or landmark might offer no clue to its identity, and leave us stumped for a search term. If we see a piece of furniture or clothing and want to buy something similar, the chances of finding the right item by typing "grey sofa" or "red shoes" are slim. All these scenarios, however, are perfectly suited to the fast-growing world of visual search. We don't ask the questions, cameras do, by presenting an image. The search engine then returns information, similar images or price details – and they're getting better at doing so by the day.

In recent months, Google has slowly promoted its visual search tool, Google Lens, from a hidden-away feature in its Photos app to something a lot more visible. This week it popped up in the Google search app on iPhones, with a camera icon prominently displayed next to the text input field. Pressing that icon activates Lens immediately, and on my desk it identified a brand of pen, a model of computer keyboard and a pachyphytum sitting in a plant pot with no trouble at all. While such technology isn’t new, there’s a growing realisation that it’s working much better than it used to.

It’s all about perception

“A few years ago there were a lot of companies doing visual search which were, frankly, pretty poor,” says Iain McCready, chief executive of visual search company Cortexica. “Some of them weren’t matching anything apart from the dominant colour in a scene, and it only worked in controlled conditions such as in a laboratory. That gave visual search a bad name.”

In the past few years, however, machine-learning and “big data” analysis have made algorithms much more adept at identifying objects. But it’s not easy to train a computer to “see” like a human being, says Paul Melcher of Melcher System, a New York consultancy for visual technology firms. “If I show a machine a picture of a cat’s reflection in a mirror, it will tell you that it’s a cat. It won’t recognise that it’s a cat in a mirror, because it doesn’t understand the context,” he says. But the science of computer vision is improving fast, too. “It’s always a good idea to try and mimic nature because nature’s had a couple of million years of development,” says McCready. “It’s all about perception – colour, shape, motif – and by mimicking that you get a system that’s good in real life situations, such as poor lighting or unusual angles.”

As visual search improved, the question remained of what it might be used for. “A few years ago, nobody could see how it could solve real life problems,” McCready says. “But when we started working with the fashion industry, they got it straight away. We showed our system to the founder of Net-A-Porter, and she said ‘I’ve been looking for this my whole life!’” Fashion became the driving force behind the adoption of visual search when brands such as Asos realised that when customers have difficulty searching for a product, they stop shopping. Farfetch became the latest to join the club, with the recent launch of its “See It, Snap It, Shop It” feature to match people’s camera snaps with products in their catalogue.

Ethical concerns for the future

Clothing is just the tip of the iceberg, however. Pinterest, eBay and Facebook all use image recognition tools to drive search and, ultimately, sales. Indeed, Pinterest’s technology is now employed by device manufacturers (such as Samsung) and retailers (such as United States department store Target). Back in September, Snapchat announced a partnership with Amazon, where a snap of an object links to an Amazon product page. This rush to jump on the visual search bus is predicated, unsurprisingly, on its effect on spending. One study found that people using visual search are 75 per cent more likely to make a return visit to a website, and spend 9 per cent more than those who don’t. Industry analysts reckon the industry will be worth $86 billion (Dh315.83bn) globally by 2025.

The technology still has some way to go, though. “If you take a picture of a cashmere sweater,” says Melcher, “a visual search will return sweaters that look the same, but are made of other materials. It’s obvious to us, but not to the computer.” While text-based search engines have become very good at knowing why we’re searching for something, that’s not something an image can convey; it doesn’t reveal if we like the colour, the shape or the functionality of an item. But new-use cases are being found all the time.

Read more:

How WhatsApp has given rise to a new era of fake news

Are we all starting to realise that Instagram influence is 'fundamentally soulless'?

Can you feel the music? Yes, as it turns out: the rise of haptic feedback


Hikers and tourists can use a camera to discover information about their surroundings; Wired journalist Lauren Goode described how it made her feel "more deeply involved in the real world". Google Lens's ability to extract text from an image and translate it is breaking down language barriers. Searching for fonts to match your favourite typeface is a breeze. "Our system has been used to monitor production line machines to make sure they're working; to check that workers are wearing hard hats on a construction site; to see whether someone has left their bag in a shopping centre," says McCready.

The one aspect of visual search that causes disquiet, however, is facial recognition. When it first launched, Google Lens would respond to a picture of a face with the message “Lens doesn’t recognise people”. However, it now returns matches and information for pictures of celebrities, which begs the question (given the millions of photographs in Google’s index) of who it deems famous and who it doesn’t. It’s only Google’s adopted ethical principles (“be socially beneficial” and “be accountable to people”) that stops it returning name matches to pictures of any face.

But even setting facial recognition aside, there are other ethical concerns, Melcher says. “Where you eat, who comes to your house, what food is in your fridge, what clothes you’re wearing; this is all valuable information, which can, in turn, can be used to sell you stuff and even create products for you based on your habits,” he says.

“The old saying that an image is worth a thousand words is truer today than it has ever been.”