Experimentation for Engineers cover
welcome to this free extract from
an online version of the Manning book.
to read more
or

2 A/B Testing: Evaluating a change to the system

 

This chapter covers

  • Randomizing to remove measurement bias
  • Replicating to reduce measurement variation
  • Determining how many measurements to take
  • Running a full A/B test without harming production
  • Deciding whether to accept or reject the system change
  • Resisting the temptation to stop an A/B test early

In chapter 1 you saw that the final step in the engineer’s workflow is to measure how business metrics are impacted by a change to the system. You do this by running the changed system in production as part of an experiment. Experiments are the most accurate way to measure changes in business metrics.

In this chapter you’ll learn how to run an A/B test, the simplest and most widely-used type of experiment. An A/B test compares the performance of the current production system – the “A” in A/B -- to the performance of the changed system – the “B” in A/B. When an A/B test is complete, you’ll have the information you need to decide whether to change your system (accept B), or to just leave it alone (reject B).

Figure 2.1 Three stages of an A/B test: Design, Run, and Analyze.

An A/B test has three stages (see figure 2.1):

2.1      Design I: Randomize to remove measurement bias

2.1.1       A problematic design

2.1.2       An unbiased design

2.2      Design II: Replicate to reduce variation

2.2.1       Replication reduces variation

2.2.2       Quantify variation with standard error

2.3      Design III: Determine the number of individual measurements to take

2.3.1       Minimize measurement costs

2.3.2       Limit incorrect rejection (false negatives)

2.3.3       Calculate the false-negatives threshold

2.3.4       Limit incorrect acceptance (false positives)

2.3.5       Limit false negatives and false positives simultaneously

2.4      Run and analyze the A/B test

2.4.1       Run a small-sized A/A test

2.4.2       Run a small-sized A/B test

2.4.3       Run and analyze the full-sized A/B test

2.5      Early stopping produces invalid results

2.6      Summary

sitemap