Practical experimentation at Microsoft

The Microsoft Bing team responsible for conducting controlled experiments have a paper out that canvasses some practical problems they’ve come across in the thousands of experiments that they’ve run. It’s an interesting read, even if the subject matter isn’t particularly fascinating for those outside the search business. A lot of the things they find sound really obvious in the general sense but would be tricky to pick up in practice.

The main points are:

  • Very few ‘good ideas’ are actually good ideas because we dont’ really understand the behaviour of people outside our social group, even if they’re our customers. About 10% of ideas that make it to experimentation actually turn out to be beneficial to the business.
  • The criteria used to judge success are not always obvious and there can be a trade-off between short-run and long-run success. For example, degrading the quality of internet search results increases market share because users have to spend more time on your page. In the long run that wouldn’t hold up but it could take many weeks to see the drop-off in the results.
  • Understanding your instruments is crucial to interpreting results. Some results are an artifact of the survey method and that can often be really hard to pick up. This is often the case for economists when we don’t read the details of survey methods. The best applied economists actually take advantage of the design details of particular surveys to conduct natural experiments.
  • Don’t extrapolate from trends in the immediate aftermath of shocks. When you watch data in real-time after a shock you’re often just seeing a trend towards the long-run mean that will shortly stabilise.

HT: Andrew Gelman