Richard Thaler has spent his academic career acting on what he calls a “slow hunch”. As a young man it struck him that the economic models of his professors were situated in a world of Econs - entirely rational economic beings - but when he looked around him he saw no end of irrational Humans. He spent the next four decades working this vein, and in so doing helped create the field of behavioural economics. In this excerpt from his book Misbehaving: The Making of Behavioural Economics, we trace the course of his progress. We start early in his teaching career, when he administers a midterm exam to his students and gains an example of just how irrational people can be:
When the students got their results they were in an uproar. Their principal complaint was that the average score was only 72 points out of a possible 100.
What was odd about this reaction was that the average numerical score on the exam had absolutely no effect on the distribution of grades. The norm at the school was to use a grading curve in which the average grade was a B or B+, and only a tiny number of students received grades below a C.
The resulting distribution of grades was not different from normal, but this announcement had no apparent effect on the students’ mood. They still hated my exam, and they were none too happy with me either. As a young professor worried about keeping my job, I was determined to do something about this, but I did not want to make my exams any easier. What to do?
Finally, an idea occurred to me. On the next exam, I made the total number of points available 137 instead of 100. This exam turned out to be slightly harder than the first, with students getting only 70% of the answers right, but the average numerical score was a cheery 96 points. The students were delighted! No one’s actual grade was affected by this change, but everyone was happy. From that point on, whenever I was teaching this course, I always gave exams a point total of 137, a number I chose for two reasons. First, it produced an average score well into the 90s, with some students even getting scores above 100, generating a reaction approaching ecstasy. Second, because dividing one’s score by 137 was not easy to do in one’s head, most students did not seem to bother to convert their scores into percentages. Lest you think I was somehow deceiving the students, in subsequent years I included this statement, printed in bold type, in my course syllabus: “Exams will have a total of 137 points rather than the usual 100. This scoring system has no effect on the grade you get in the course, but it seems to make you happier.” And indeed, after I made that change, I never got a complaint that my exams were too hard.
In the eyes of an economist, my students were “misbehaving.” By that I mean that their behaviour was inconsistent with the idealised model of behaviour that is at the heart of what we call economic theory. To an economist, no one should be happier about a score of 96 out of 137 (70%) than 72 out of 100, but my students were. And by realising this, I was able to set the kind of exam I wanted but still keep the students from grumbling.
This confirmed ideas that had bubbling in Thaler’s mind when he was a graduate student, and which he had tested with a puzzle he posed to students:
A. Suppose by attending this lecture you have exposed yourself to a rare fatal disease. If you contract the disease you will die a quick and painless death sometime next week. The chance you will get the disease is 1 in 1,000. We have a single dose of an antidote for this disease that we will sell to the highest bidder. If you take this antidote the risk of dying from the disease goes to zero. What is the most you would be willing to pay for this antidote?
B. Researchers at the university hospital are doing some research on that same rare disease. They need volunteers who would be willing to simply walk into a room for five minutes and expose themselves to the same 1 in 1,000 risk of getting the disease and dying a quick and painless death in the next week. No antidote will be available. What is the least amount of money you would demand to participate in this research study?
Economic theory has a strong prediction about how people should answer the two different versions of these questions. The answers should be nearly equal. For a fifty-year-old US resident answering the questions, who already faces a roughly 4-in-1,000 chance of dying each year, the trade-off between money and risk of death should not be very different when moving from a risk of 5 in 1,000 (.005) to .004 (as in the first version of the question) than in moving from a risk of .004 to .005 (as in the second version). Answers varied widely among respondents, but one clear pattern emerged: the answers to the two questions were not even close to being the same. Typical answers ran along these lines: I would not pay more than $2,000 in version A but would not accept less than $500,000 in version B. In fact, in version B many respondents claimed that they would not participate in the study at any price.
Economic theory is not alone in saying the answers should be identical. Logical consistency demands it.
This truth is not apparent to everyone. In fact, even when explained, many people resist, as you may be doing right now. But the logic is inescapable. To an economist, these findings were somewhere between puzzling and preposterous.
As Thaler carves out an academic career, he begins to find some like-minded thinkers but any headway he makes is halting. He works on refining his ideas about how Humans think about money, and explores the idea of “mental accounting”:
Eventually I settled on a formulation that involves two kinds of utility: acquisition utility and transaction utility. Acquisition utility is based on standard economic theory and is equivalent to what economists call “consumer surplus.” As the name suggests, it is the surplus remaining after we measure the utility of the object gained and then subtract the opportunity cost of what has to be given up. For an Econ, acquisition utility is the end of the story. A purchase will produce an abundance of acquisition utility only if a consumer values something much more than the marketplace does. If you are very thirsty, then a one-dollar bottle of water is a utility windfall.
Humans, on the other hand, also weigh another aspect of the purchase: the perceived quality of the deal. That is what transaction utility captures. It is defined as the difference between the price actually paid for the object and the price one would normally expect to pay, the reference price. Suppose you are at a sporting event and you buy a sandwich identical to the one you usually have at lunch, but it costs triple the price. The sandwich is fine but the deal stinks. It produces negative transaction utility, a “rip-off.” In contrast, if the price is below the reference price, then transaction utility is positive, a “bargain”.
Because consumers think this way, sellers have an incentive to manipulate the perceived reference price and create the illusion of a “deal.” One example that has been used for decades is announcing a largely fictional “suggested retail price,” which actually just serves as a misleading suggested reference price. In America, some products always seem to be on sale, such as rugs and mattresses, and at some retailers, men’s suits. Goods that are marketed this way share two characteristics: they are bought infrequently and quality is difficult to assess. The infrequent purchases help because consumers often do not notice that there is always a sale going on. Most of us are pleasantly surprised that when we wander in to buy a new mattress, there happens to be a sale this week. And when the quality of a product, like a mattress, is hard to assess, the suggested retail price can do double duty. It can simultaneously suggest that quality is high (thus increasing perceived acquisition utility) and imply that there is transaction utility to be had because the product is “on sale.”
Shoppers can get hooked on the thrill derived from transaction utility. If a retailer known for frequent discounting tries to wean their customers away from expecting great deals, it can struggle.
Macy’s notably tried—and failed—to wean customers off their addiction to frequent sales. In an image makeover undertaken in 2006–07, Macy’s leadership specifically targeted coupons as a price reduction device, and wanted to reduce their usage. Macy’s saw coupons as a threat, linking the brand too closely to less prestigious retailers such as JC Penney or Kohl’s. After taking over several other department store chains across the country and rebranding them all as Macy’s, they cut the use of coupons by 30% in the spring of 2007, compared to the prior spring. This did not go over well with customers. Sales plummeted, and Macy’s quickly promised to return to its previous glut of coupons by the holiday season of that same year.
Now we follow Thaler’s progress as the field of behavioural economics finds its feet. Early support comes from the Russell Sage Foundation, a social policy think-tank in New York, that provides one-year fellowships to practitioners of behavioural economics. Thaler uses his time at the foundation to explore the idea of “narrow framing”.
During our year at Russell Sage, my colleague Colin Camerer and I would frequently take taxis together. Sometimes it was difficult to find an empty cab, especially on cold days or when a big convention was in town. We would occasionally talk to the drivers and ask them how they decided the number of hours to work each day.
Most drivers work for a company with a large fleet of cabs. They rent the cab for a period of twelve hours, usually from five to five, that is, 5am to 5pm, or 5pm to 5am. The driver pays a flat amount to rent the cab and has to return it with the gas tank full. He keeps all the money he makes from the fares on the meter, plus tips. We started asking drivers, “How do you decide when to quit for the day?” Twelve hours is a long time to drive in New York City traffic, especially while trying to keep an eye out for possible passengers. Some drivers told us they had adopted a target income strategy. They would set a goal for how much money they wanted to make after paying for the car and the fuel, and when they reached that goal they would call it a day.
The question of how hard to work was related to a project Colin, our colleague George Loewenstein and I had been thinking about; we called it the “effort” project. We had discussed the idea for a while and had run a few lab experiments, but we had yet to find an angle we liked. We decided that studying the actual decision-making of cab drivers might be what we had been looking for.
All drivers kept a record of each fare on a sheet of paper called a trip sheet. The information recorded included the time of the pickup, the destination, and the fare. The sheet also included when the driver returned the car. Somehow, Colin managed to find the manager of a taxicab company who agreed to let us make copies of a pile of these trip sheets. We later supplemented this data set with two more we obtained from the New York City Taxi and Limousine commissioner.
The central question that the paper asked is whether drivers work longer on days when the effective wage is higher. The first step was to show that high- and low-wage days occur, and that earnings later in the day could be predicted by earnings during the first part of the day. This is true. On busy days, drivers make more per hour and can expect to make more if they work an additional hour. Having established this, we looked at our central question and got a result economists found shocking. The higher the wage, the less drivers worked.
Basic economics tells us that demand curves slope down and supply curves slope up. That is, the higher the wage, the more labour that is supplied. Here we were finding just the opposite result! It is important to clarify just what these results say and don’t say. Like other economists, we believed that if the wages of cab drivers doubled, more people would want to drive cabs for a living. And even on a given day, if there is a reason to think that a day will be busy, fewer drivers will decide to take that day off and go to the beach. Even behavioural economists believe that people buy less when the price goes up and supply more when the wage rises. But in deciding how long to work on a given day that they have decided to work, the drivers were fall- ing into a trap of narrowly thinking about their earnings one day at a time, and this led them to make the mistake of working less on good days than bad ones.
Well, not all drivers made this mistake. Driving a cab is a Groundhog Day–type learning experience, in which the same thing happens every day, and cab drivers appear to learn to overcome this bias over time. We discovered that if we split each of our samples in half according to how long the subjects had been cab drivers, in every case the more experienced drivers behaved more sensibly. For the most part, they drove more when wages were higher, not lower. But of course, that makes the effect even stronger than average for the inexperienced drivers, who look very much like they have a target income level that they shoot for, and when they reach it, they head home.
To connect this with narrow framing, suppose that drivers keep track of their earnings at a monthly rather than a daily level. If they decided to drive the same amount each day, they would earn about 5% more than they do in our sample. And if they drove more on good days and less on bad days, they would earn 10% more over the same amount of hours. We suspected that, especially for inexperienced drivers, the daily income target acts as a self-control device. “Keep driving until you make your target or run up against the twelve-hour maximum” is an easy rule to follow, not to mention justify to yourself or a spouse waiting at home. Imagine instead having to explain that you quit early today because you didn’t make very much money. That will be a long conversation, unless your spouse is an economist.
The cabs paper was published in a special issue of the Quarterly Journal of Economics dedicated to the memory of my friend and mentor Amos Tversky.
Eventually Thaler joins the faculty at the University of Chicago’s Booth School of Business. His appointment is greeted by an old-school colleague with the words, “Each generation has got to make its own mistakes”. Thaler fails to be discouraged and continues to build his body of work. With his colleague Cade Massey, Thaler studies the National Football League’s annual draft in which teams select players:
Here is a simple thought experiment. Suppose you rank all the players taken in the draft at a given position (quarterback, wide receiver, etc.) by the order in which they were picked. Now take two players drafted consecutively, such as the third running back and the fourth. What is the chance that the player taken earlier is better by some objective measure? If the teams were perfect forecasters, then the player taken first would be better 100% of the time. If the teams have no ability, then the earlier pick will be better half the time, like flipping a coin.
In reality, across the entire draft, the chance that the earlier player will be better is only 52%. In the first round it is a bit higher, 56%. (These statistics used the simple metric of “games started” to determine who is better.) Keep that thought in mind, both as you read the rest of this chapter and the next time you want to hire someone and are “sure” you have found the perfect candidate.
Our research yielded two simple pieces of advice to teams. First, trade down. Trade away high first-round picks for additional picks later in the draft, especially second-round picks. Second, be a draft-pick banker. Lend picks this year for better picks next year.
Before we even had our first draft of this paper, we had some interest from one of the NFL teams, and by now we have now worked informally with three teams (one at a time, of course). The first interaction we had was with Daniel Snyder, the owner of the Washington Redskins. Mr. Snyder had been invited by the entrepreneurship club at the Booth School of Business to give a talk, and one of the organisers asked me to moderate a discussion for the audience. I agreed, knowing I would have some time to talk to Snyder one-on-one during lunch.
Mr Snyder had only been an owner for a brief period when we met. I told Mr. Snyder about the project with Cade and he immediately said he was going to send “his guys” to see us right away, even though they were in the midst of the season. He said, “We want to be the best at everything.” Apparently when Mr. Snyder wants something he gets it. That Monday I got a call from his chief operating officer, who wanted to talk to Cade and me ASAP. We met Friday of that week with two of his associates and had a mutually beneficial discussion. We gave them the basic lessons of our analysis, and they were able to confirm some institutional details for us.
After the season ended, we had further discussions with Snyder’s staff. By then, we were pretty sure they had mastered our two takeaways: trade down and trade picks this year for better picks next year. Cade and I watched the draft on television that year with special interest that turned into deep disappointment. The team did exactly the opposite of what we had suggested! They moved up in the draft, and then traded away a high draft pick next year to get a lesser one this year. When we asked our contacts what happened we got a short answer. “Mr. Snyder wanted to win now.”
This was a good forecast of Snyder‘s future decisions.
Here, Thaler’s ideas catch on with the publication of his book Nudge (co-written with Cass Sunstein) and in 2010 he helps set up a team to apply lessons from behavioural economics to government in the UK:
Soon after the coalition agreement between David Cameron and Nick Clegg was sorted out, the Cameron policy adviser Rohan Silva was in touch. The new government was serious about using behavioural economics, and behavioural science more generally, to make government more effective and efficient. He wanted to know if I would be willing to help. Of course I said yes.
By some stroke of luck, genius, and timing, David Halpern was selected to run this as yet unnamed operation. David not only is a first-rate social scientist who taught at Cambridge University, but also served as the chief analyst in Prime Minister Tony Blair’s strategy unit. He also coauthored previous UK reports on how behavioural approaches might be used by government, including one while working for Blair. This meant two things: he possessed vast knowledge and experience about how government works, and had the kind of nonpartisan credentials that would be crucial in establishing the team as a source of impartial information.
By the time of my next trip to London, the initial team had been established and was set up in temporary facilities in an obscure corner of the Admiralty Arch, located a short walk away from 10 Downing Street and Parliament. It was winter, and London had been hit with what locals considered a massive snowstorm. Accumulation was about an inch. And it was not much warmer inside than outside the drafty building that served as the team’s first home.
The official mission of the Behavioural Insights Team (BIT) was left broad: to achieve significant impact in at least two major areas of policy; to spread understanding of behavioural approaches across government; and to achieve at least a tenfold return on the cost of the unit. The basic idea was to use the findings of behavioural science to improve the workings of government. There was no manual for this task, so we had to figure it out on the fly. On this and subsequent visits, I would often go to meetings with some high-level government official, the minister of some department or that minister’s deputy, joined by David and another team member. We would typically begin these meetings by asking what problems the department faced and then brainstorm about what might be done to help. It was vital to the success of the project that we let the departments select the agenda, rather than lecture them on the glories of behavioural science.
The first meeting I attended went so well that I could easily have gotten the impression that this business of employing behavioural insights to improve public policy would be easy. Nick Down, of Her Majesty’s Revenue and Customs (HMRC), the British tax collection authority, had heard about BIT and had reached out. His job was to collect tax revenues from people who owed the government money. For most British taxpayers, there is little risk of falling into this situation. Employers withhold taxes from employees’ paycheques through what is called a “pay as you earn” system. For those who earn all their income through wages and salary there is no need to file a tax return and no bill to pay. However, people who are self-employed or have other sources of income besides their regular job have to file a return and can be confronted with a sizable bill.
For taxpayers who have to file a return, payments are required on January 31 and July 31. If the second payment is not received on time, the taxpayer is sent a reminder notice, followed by letters, phone calls, and eventually legal action. As with any creditor, the HMRC views the use of a collection agency or legal action as a last resort, since it is expensive and antagonises the taxpayer, who is, of course, also a voter. If that first notice could be written more effectively, it could save HMRC a lot of money. That was Nick Down’s goal.
He was already off to a good start. He had read the work of psychologist Robert Cialdini, author of the classic book Influence. Many people have called Danny Kahneman the most important living psychologist and I would hardly disagree, but I think it would be fair to say that Cialdini is the most practical psychologist alive. Beyond Cialdini’s book, Nick Down had also received some advice from a consulting firm that is affiliated with Cialdini to help him think about how he might get people to pay their taxes promptly.
Nick’s team had already run a pilot experiment with a letter that used a standard recommendation from the Cialdini bible: if you want people to comply with some norm or rule, it is a good strategy to inform them (if true) that most other people comply. In Nudge, we had reported on a successful use of this idea in Minnesota. In that study, overdue taxpayers were sent a variety of letters in an effort to get them to pay, with messages varying from telling them what their money would be spent on to threatening legal action, but the most effective message was simply telling people that more than 90% of Minnesota taxpayers paid their taxes on time. This latter fact was also true in Britain, and the pilot experiment used a letter with similar language. The results seemed supportive, but the pilot had not been done in a scientifically rigorous manner; it lacked a control group and several things were varied at once. Nick was keen to do more but did not have the training or staff to conduct a proper experiment, and did not have the budget to rely on outside consultants.
It was our good fortune to run into Nick Down at such an early stage of BIT’s development. He was already sold on the idea that behavioural science could help him do his job better, he was willing to run experiments, and the experiments were cheap. All we had to do was fiddle with the wording of a letter that would be sent to taxpayers anyway. We didn’t even have to worry about the cost of postage. Best of all, fine-tuning the letters could potentially save millions of pounds. BIT had a scheduled two-year run, after which it would be up for review. The tax experiment had the potential to provide an early win that would quiet sceptics who thought that applying behavioural science to government policy was a frivolous activity that was doomed to fail.
Our initial meeting eventually led to three rounds of experimentation at increasing levels of sophistication. Michael Hallsworth from BIT and a team of academics conducted the most recent experiment. The sample included nearly 120,000 taxpayers who owed amounts of money that varied from £351 to £50,000. Everyone received a reminder letter explaining how their bill could be paid, and aside from the control condition, each letter contained a one-sentence nudge that was some variation on Cialdini’s basic theme that most people pay on time. Some examples:
• The great majority of people in the UK pay their taxes on time.
• The great majority of people in your local area pay their taxes on time.
• You are currently in the very small minority of people who have not paid their taxes on time.
If you are wondering, the phrase “the great majority” was used in place of the more precise “90% of all taxpayers” because some of the letters were customised for specific localities, and BIT was unable to confirm that the 90% number was true for every locality used. There is an important general point here. Ethical nudges must be both transparent and true. That is a rule the BIT has followed scrupulously.
All the manipulations helped, but the most effective message combined two sentiments: most people pay and you are one of the few that hasn’t. This letter increased the number of taxpayers who made their payments within twenty-three days by over five percentage points. Since it does not cost anything extra to add a sentence to such letters, this is a highly cost-effective strategy. It is difficult to calculate exactly how much money was saved, since most people do pay their taxes eventually, but the experiment sped up the influx of £9 million in revenues to the government over the first twenty-three days. In fact, there is a good chance that the lessons learnt from this experiment will save the UK government enough money to pay for the entire costs of the BIT for many years.
The BIT passed its built-in two-year review and was renewed by the Cabinet Office in 2012. Because the team had continued to grow rapidly, it was necessary to find it a new home. The stay in the drafty original quarters was mercifully brief, but the next home, in borrowed space within the Treasury Department, was too small for the growing team’s needs. So in 2014, a decision was made to partially privatise the BIT. It is now owned in equal parts by the Cabinet Office, its employees, and its nonprofit partner Nesta, which is providing the team with its current workspace. BIT has a five-year contract with the Cabinet Office. The team has grown to nearly 50 and now supports a range of public bodies across the U.K., and increasingly helps other national governments too, including an exciting new tax compliance study in Guatemala.
Other countries are also joining the movement. A study conducted by the Economic and Social Research Council published in 2014 reports that 136 countries around the world have incorporated behavioural sciences in some aspects of public policy, and 51 “have developed centrally directed policy initiatives that have been influenced by the new behavioural sciences.” Clearly word is spreading.
Behavioural economics is no longer a fringe operation, and writing an economics paper in which people behave like Humans is no longer considered misbehaving, at least by most economists under the age of fifty. After a life as a professional renegade, I am slowly adapting to the idea that behavioural economics is going mainstream. Sigh.
* Reprinted by permission of W W Norton & Company