2 A/B Testing: Evaluating a change to the system

This chapter covers

Randomizing to remove measurement bias
Replicating to reduce measurement variation
Determining how many measurements to take
Running a full A/B test without harming production
Deciding whether to accept or reject the system change
Resisting the temptation to stop an A/B test early

In chapter 1 you saw that the final step in the engineer’s workflow is to measure how business metrics are impacted by a change to the system. You do this by running the changed system in production as part of an experiment. Experiments are the most accurate way to measure changes in business metrics.

In this chapter you’ll learn how to run an A/B test, the simplest and most widely-used type of experiment. An A/B test compares the performance of the current production system – the “A” in A/B -- to the performance of the changed system – the “B” in A/B. When an A/B test is complete, you’ll have the information you need to decide whether to change your system (accept B), or to just leave it alone (reject B).

Figure 2.1 Three stages of an A/B test: Design, Run, and Analyze.

An A/B test has three stages (see figure 2.1):

2.1 Design I: Randomize to remove measurement bias

2.1.1 A problematic design

2.1.2 An unbiased design

2.2 Design II: Replicate to reduce variation

2.2.1 Replication reduces variation

2.2.2 Quantify variation with standard error

2.3 Design III: Determine the number of individual measurements to take

2.3.1 Minimize measurement costs

2.3.2 Limit incorrect rejection (false negatives)

2.3.3 Calculate the false-negatives threshold

2.3.4 Limit incorrect acceptance (false positives)

2.3.5 Limit false negatives and false positives simultaneously

2.4 Run and analyze the A/B test

2.4.1 Run a small-sized A/A test

2.4.2 Run a small-sized A/B test

2.4.3 Run and analyze the full-sized A/B test

2.5 Early stopping produces invalid results

2.6 Summary