A Harvard Business School professor on how companies like Google and Amazon use experimentation to innovate, grow, and improve

Four hundred years ago, in 1620, Francis Bacon published “Novum Organum,” the classical formulation of a new instrument for building and organizing knowledge: the scientific method. Thinking and acting scientifically has had an enormous impact on the world. For centuries, we’ve built and organized scientific and technological knowledge through testable explanations and predictions. These, in turn, have given us modern medicine, food, energy, transportation, communication, and so much more. The engine that has powered the scientific method is the humble experiment.

And today, companies as varied as Google, Booking.com, Nike, Kohl’s, State Farm Insurance, and the BBC are running experiments to fuel innovation – to roll out new products, improve customer experiences, and try new business models. These organizations have discovered that an “everything is a test” mentality yields surprisingly large payoffs and competitive benefits, and may even help stock performance. I’ve spent over 25 years studying experimentation in businesses and, along the way, benefited tremendously from the work of many scholars and practitioners. I think that they would all agree with me: experimentation works.

Why? Consider the cautionary tale of Ron Johnson. Soon after he left Apple to become the CEO of JC Penney in 2011, he led his team to implement a bold new plan. Under his leadership, the company eliminated coupons and clearance racks, filled stores with branded boutiques, and used technology to eliminate cashiers, cash registers, and checkout counters. Yet after just 17 months, sales had plunged, losses had soared, and Johnson had lost his job.

Stefan H. Thomke.

caption
Stefan H. Thomke.
source
Stefan H. Thomke

How could Penney have gone so wrong? Didn’t it have lots of transaction data revealing customers’ tastes and preferences? What about Johnson’s experience in creating Apple’s highly successful store concept, which redefined the customer in-store experience with innovations like the Genius Bar and cashier-free checkout? Those innovations led to the highest average retail sales per square foot of any retailer worldwide with stores and more visitors than Disney’s theme parks. The Penney board must have hoped that Johnson would repeat Apple’s retail success at the old department store chain, with its more than one thousand United States locations. Why didn’t that happen?

For one thing, most managers operate in a world where they lack sufficient data or relevant experience to inform their innovation decisions. That is, there may be transaction data, but that information provides clues only about past behavior, not about how customers might react to future changes. Oftentimes, too, managers rely on their intuition – but ideas that are truly innovative typically go against experience. In fact, most ideas don’t work. Whether it’s improving customer experiences, trying out new business models, or developing new products and services, even the most experienced business leaders are often wrong.

Not all is lost, however. The good news is that managers can discover whether a change in product, service, or business model will succeed. They can do that by subjecting it to a rigorous experiment. Think of it this way: A pharmaceutical company would never introduce a drug without first conducting a round of experiments based on established scientific protocols (in fact, the US Food and Drug Administration requires extensive clinical trials). Yet that is essentially what many companies do when they roll out new business models and other novel changes. Had Penney run rigorous experiments on its CEO’s proposed innovations, the company might have discovered that, notwithstanding the success of these innovations at Apple, Penney customers would probably reject them. Such a rejection would have not been surprising, given the long odds against any innovation. In fact, Microsoft has found that only one-third of its experiments prove effective, one-third have neutral results, and one-third have negative results.

caption
“EXPERIMENTATION WORKS: The Surprising Power of Business Experiments.”
source
Stefan H. Thomke

Had Penney tested extensively, it would have found itself in good company. Google employs extensive experimentation in its ongoing quest for the best customer experience. Even its experts get it wrong most of the time. Eric Schmidt, its former CEO, disclosed the odds in a 2011 Senate testimony:

To give you a sense of the scale of the changes that Google considers, in 2010 we conducted 13,311 precision evaluations to see whether proposed algorithm changes improved the quality of its search results, 8,157 side-by-side experiments where it presented two sets of search results to a panel of human testers and had the evaluators rank which set of results was better, and 2,800 click evaluations to see how a small sample of real-life Google users responded to the change. Ultimately, the process resulted in 516 changes that were determined to be useful to users based on the data and, therefore, were made to Google’s algorithm. Most of these changes are imperceptible to users and affect a very small percentage of websites, but each one of them is implemented only if we believe the change will benefit our users.

In other words, Google’s experts missed their mark 96.1% of the time. The low (3.9%) success rate includes less rigorous tests, such as click evaluations. At Google and Bing, about 10% to 20% of controlled experiments generate positive results. But it’s precisely that capability – to test what does and does not work at a huge scale – that has given the company an advantage against its competitors. Scott Cook, the cofounder of Intuit and a former Amazon director, recalled former Yahoo executives saying as much: “‘[Google] just outran us,’ they said. ‘We didn’t have that experimentation engine.'” Even Yahoo’s highly publicized project Panama – launched in 2007 as an effort to close the wide gap with Google in the race for advertising dollars – couldn’t erase the advantage of Google’s ferocious experimentation, which was the company’s system of continuous improvement.

A company’s ability to create and refine its products, customer experiences, processes, and business models – in other words, to compete – is deeply affected by its ability to experiment.

The rationale behind experimentation is the pursuit of knowledge about cause and effect; all experiments yield information through understanding what does, and does not, work. For centuries, scientists and engineers have relied on experiments, guided by their insight and intuition, to learn new information and advance knowledge. Experiments have been conducted to characterize naturally occurring processes, to decide among competing scientific hypotheses, to find hidden mechanisms of known effects, and to simulate what is difficult or impossible to research through observation – in short, to inductively establish scientific laws.

In the business world, experiments have led to the discovery of both technical solutions and new markets. A classic example of both is the discovery of 3M’s Post-it Note. The story begins in 1964, when 3M chemist Spencer Silver started a series of experiments aimed at developing polymer-based glues. As Silver recalled: “The key to the Post-it adhesive was doing the experiment. If I had sat down and factored it out beforehand, and thought about it, I wouldn’t have done the experiment. If I had limited my thinking only to what the literature said, I would have stopped. The literature was full of examples that said that you can’t do this.”

Although Silver discovered a new glue with unique properties – a high level of “tack” but low adhesion – it would take 3M at least another five years to find a market. Silver kept trying to sell his glue to other departments at 3M, but they were focused on finding a stronger glue that formed an unbreakable bond, not a weaker glue that only supported a piece of paper. Market tests with different concepts (such as a sticky bulletin board) were telling 3M that the Post-it concept was hopeless – the adhesive just didn’t solve any known customer problems – until Silver met Arthur Fry. Fry, a chemist and choir director, observed that members of his choir would frequently drop bookmarks when switching between songs. “Gee,” wondered Fry, “if I had a little adhesive on these bookmarks, that would be just the ticket.” This “Eureka moment” launched a series of experiments with the new glue that broadened its applicability and ultimately led to a paper product that could be attached and removed without damaging the original surface. In other words, repeated experimentation was instrumental in finding the now-obvious solution to a frustrating customer problem once the Eureka moment occurred.

While such Eureka moments make for memorable stories, they do not give a complete account of the various experimentation strategies, tools, processes, and histories that lead to innovative solutions. After all, such moments are usually the result of many failed experiments and accumulated learning that prepare the experimenter to take advantage of the unexpected. “Failure and invention,” notes Amazon’s CEO Jeff Bezos, “are inseparable twins. If you already know it’s going to work, it’s not an experiment.” Consider what the authors of a careful study of Thomas Edison’s invention of the electric light bulb concluded:

This invention [the electric light], like most inventions, was the accomplishment of men guided largely by their common sense and their past experience, taking advantage of whatever knowledge and news should come their way, willing to try many things that didn’t work, but knowing just how to learn from failures to build up gradually the base of facts, observations, and insights that allow the occasional lucky guess – some would call it inspiration – to effect success.

When management aims for big results, however, they cannot rely on lucky guesses, experience, or intuition alone. Their companies’ business experiments must be disciplined, organizationally aligned, supported by an infrastructure, and culturally embraced; that is, running experiments should be as normal as running the numbers. At the same time, the serendipitous breakthroughs may be more likely to occur when managers are clear that understanding what does not work is as important as learning what does.

Reprinted by permission of Harvard Business Review Press. Excerpted from EXPERIMENTATION WORKS: The Surprising Power of Business Experiments by Stefan H. Thomke. Copyright 2020 Stefan H. Thomke. All rights reserved.