Science gives trust in online ratings the thumbs down

Problem of bogus online reviews highlights flaw in all subjective scoring systems, writes Robert Matthews

HTMBF3 Retro vintage till in a bar
Powered by automated translation

We all do it, whether we're planning a holiday or just choosing a movie to watch – we check the reviews.

From TripAdvisor to YouTube and Amazon, ratings are everywhere. Yet whether it's oddly ecstatic praise for ho-hum products or catastrophically bad ratings for renowned hotels, it's long been clear that many online reviews aren't what they seem.

Now the online film review site Rotten Tomatoes has become the latest to take action to combat ratings abuse.

This year the site killed off its "want to see" scores, where movie fans could show how excited they were about an upcoming release. Now it has gone further. If you want your opinion of a film taken seriously, you will first have to prove you actually saw it.

These moves come after evidence revealing that some movies are deliberately targeted – "review bombed" – by trolls.

In the run-up to the world premiere in March of Captain Marvel, the movie generated such a negative response that its "want to see" rating plunged to a dismal 28 per cent.

The suspicion was that the movie provoked a backlash among internet trolls who took exception to the idea of a female superhero.

Similar campaigns are believed to lie behind the tsunamis of negative reviews of The Last Jedi, Ghostbusters: Answer the Call and the TV debut of the first female Dr Who.

In the end, Captain Marvel was praised by critics and the public alike on its release, netting more than $1 billion (Dh3.67bn) to date, and becoming the second-biggest movie for gross profit this year.

Marvel Studios' CAPTAIN MARVEL..Carol Danvers/Captain Marvel (Brie Larson). Photo: Chuck Zlotnick/ Marvel Studios
There was a suspicion before the premiere in March of Captain Marvel that the film had been the subject of a campaign to manipulate reviews. Marvel Studios

Although welcomed by many, the action taken by Rotten Tomatoes is not without problems. For a start, the verification system means only US movie goers can influence the audience score, at least for the time being. The rest of us can still post reviews but they will not count for anything.

By opting for verification, Rotten Tomatoes is using the same approach to fake reviews as that used by Amazon. But as the global online shopping service has found, it's not perfect.

In April the UK-based consumer group Which? found that reviews of hundreds of gadgets such as dashcams and smart watches showed signs of being faked – for example, with hundreds of five-star ratings appearing on the same day.

Most of the suspicious reviews were from unverified purchasers, but it’s known that some sellers instantly contact anyone posting bad reviews offering inducements to change or delete them. It is also becoming common for sellers to solicit reviews in return for freebies.

In any case, verified purchase reviews are getting harder to find. The Which? study referred to evidence that while such reviews made up 94 per cent of all monthly reviews on Amazon in the first quarter of last year, this has now dropped to less than 70 per cent.

Amazon insists it works hard to protect the integrity of reviews, using both human and artificial intelligence-based techniques to weed out fakes. But the battle for consumer trust is triggering some ugly spats between websites and technology companies claiming to be able to spot dodgy reviews.

Recent allegations that up to a third of the reviews on TripAdvisor were fake led the hotel comparison site to lambast Fakespot, the company whose algorithms were used as the basis of the claim.

According to Fakespot, its algorithms look for clues in the spelling, grammar, timing and quantity of reviews to assess reliability. TripAdvisor insists, however, that its own tests showed these methods are unreliable.

Fuelling the controversy, the respected tech review website CNET reported Fakespot's verdicts often disagree with those from ReviewMeta, another review-checking website – which hardly boosts confidence in the reliability of either.

Despite all the confusion, claims and counterclaims, at least the online world is taking concern about ratings reliability seriously.

That cannot be said for many global companies who still use spurious methods to rate how they are performing, who should get bonuses – and who should get the sack.

For years they measured customer loyalty using the now-familiar question: on a scale of 0 to 10, how likely are you to recommend this product or service to a friend?

Customers giving scores of 0 to 6 are deemed unhappy, while those giving scores of nine and 10 are seen as "promoters". The difference in the percentage of customers falling into the two groups then gives the so-called net promoter score, or NPS.

Launched in 2003 by Frederick Reichheld of global management consultancy Bain & Company, this scoring system was originally touted as a useful predictor of customer behaviour.

Yet as early as 2007, researchers were casting doubt on the validity and reliability of the method. Despite this, a recent study by The Wall Street Journal showed that the NPS ratings have "cult-like status" among chief executives, with 50 S&P 500 companies citing it in earnings conference calls last year – nearly triple the number in 2012.

Worse, The Wall Street Journal reported that the NPS has now morphed into a metric used in assessing executive performance and compensation in some leading companies – a role Mr Reichheld himself described as "completely bogus".

It is hardly the first ratings method to be pushed too hard and too far by big business. During the 1980s, so-called “rank and yank” methods emerged that assumed employee performance followed a bell-shaped curve that could be divided into top, middle and bottom ranks.

About 70 per cent of employees were deemed to be middle rank. Managers then rewarded the top 20 per cent, and cautioned or sacked the bottom 10 per cent. Despite lacking any real justification, many corporations relied on "rank and yank" for years. Then, in about 2010, accusations of bias and "gaming" surfaced along with lawsuits – and the method suddenly fell out of favour.

In the end, all ratings systems are about one thing: trying to boil down complex decisions to just one number. But as anyone who has snoozed through a top-rated movie knows, just because something is possible doesn't mean it works.

Robert Matthews is Visiting Professor of Science at Aston University, Birmingham, UK