A/B testing basics

What you'll understand by the end of this lesson

Why A/B testing produces a fundamentally different type of data from surveys, focus groups, or user testing
The one discipline that separates a test you can learn from versus one you cannot
Why most tests do not produce wins — and how to prepare your organisation for that reality
Whether a low-traffic site can still benefit from A/B testing

What A/B testing actually is

An A/B test splits your real website traffic into two groups. Group A sees the current version of a page or element — the control. Group B sees a changed version — the variant. You measure which group completes more of the action you care about: a purchase, a sign-up, a form submission.

When enough visitors have passed through to make the result statistically reliable, you read the outcome. If the variant performed meaningfully better, you have evidence — not an opinion, not a preference — that the change improved your conversion rate. If the variant performed the same or worse, you have equally valuable information.

What makes it powerful is harder to explain until you understand why the data it produces is different from every other type of research.

Why the data is different

Every other research method involves people who know they are being observed.

A focus group participant knows there's a facilitator watching. A survey respondent knows they're answering questions for a purpose. A user testing subject was recruited and knows their session is being recorded. People in watched situations edit their answers and try to be helpful — they describe their behaviour in ways that feel coherent, even if those explanations don't reflect what actually drove their decision.

The result is data that tells you how people think they behave. That is not the same thing as how they actually behave.

An A/B test collects a different type of data. Your visitors arrived at your website because they were trying to do something — find a product, evaluate a service, answer a question. Neither group knows they're in an experiment. Nobody was recruited. Nobody is performing.

A/B testing is sometimes described as "double-blind" — visitors don't know they're in a test, and you don't know who received which version until you look at the aggregate results. That structure makes it impossible for either side to unconsciously shape the outcome.

A/B test data is also collected over time — usually two to four weeks of live traffic. The result reflects visitors arriving on different days, from different sources, in different moods. That variety is what makes the finding applicable to visitors who will come after the test ends.

The one discipline that makes a test learnable

There is a rule in A/B testing that feels overly strict the first time you encounter it: test one thing at a time.

The temptation is to change several things at once. You have a page with multiple problems. You have a redesign that addresses all of them. Why not test the redesign against the original in one experiment?

Because if the redesign wins, you won't know which change caused it. If it loses, you won't know which change hurt it. You end the test with an outcome but no understanding of why — and the why is what makes the next test better.

A test built from one specific hypothesis, testing one specific change, produces a learnable result. Over months of disciplined testing, you accumulate a genuine understanding of what drives your visitors — not a collection of outcomes you can't explain or repeat.

Running a test that changes multiple elements is sometimes called a "multivariate test" — a legitimate advanced technique, but one that requires substantially more traffic and statistical complexity. For beginners, changing one thing at a time is the right default.

Most tests will not produce a win

This is the reality most introductions to A/B testing skip past: the majority of tests do not deliver a winning variant.

In well-run programmes, winning roughly one test in three is a respectable rate. Some tests confirm there's no meaningful difference between the two versions. Some find that a change which seemed obviously better made no measurable difference.

This is not a failure of the method — it is the method working correctly.

The purpose of a test is not to produce a lift on every run. It is to replace guesswork with evidence. A test that finds no meaningful difference tells you something important: this element, at this level of change, is not the constraint. That narrows the hypothesis space and improves the next test.

What damages an experimentation programme is not a run of non-winning tests. It's the expectation that each test must produce a result worth presenting. When that expectation is in place, teams start peeking at data before the test is complete, calling tests done when the numbers look good, or quietly dropping the programme after the third non-win.

The fix: set expectations before you begin. Before running your first test, explain to everyone involved — most tests will not win. The learning from the ones that don't win is still real.

Ugly wins. A simple, stripped-down variant that the team finds visually disappointing often outperforms the polished version. Visitor behaviour is indifferent to aesthetics. Design for clarity, not for admiration.

Low-traffic sites and A/B testing

The common advice is that A/B testing requires significant traffic. That's partly true.

Reaching statistical significance quickly requires a certain volume of visitors and conversions per week. High-traffic pages can run tests that reach this threshold in days. Low-traffic sites can take months.

But that is not a reason to avoid A/B testing on a low-traffic site. It's a reason to be selective.

A low-traffic site has a limited testing budget in the sense that it can only run a small number of tests per year. The right response is to reserve that budget for the questions that matter most: pricing structure, the main offer framing, the primary call to action. A test on one of these questions may take ten weeks instead of two. Run it. The data is worth the wait when the question is the right one.

Where A/B testing fits in the CRO process

By this point in the beginner track, you have learned to:

Recognise why opinion-based website decisions consistently underperform (Lesson 1)
Capture ideas, write hypotheses, and score them with ICE (Lesson 2)
Use sources of insight to build confidence in your hypothesis before testing (Lesson 3)

A/B testing is where hypotheses get resolved. The research process you built in the previous lessons is what makes your tests worth running. A well-evidenced hypothesis that loses still teaches you something — because you know the problem was real and the approach you tested simply wasn't the solution.

What makes A/B test data fundamentally different from survey or focus group data?

Think about this

You know how to test. But what makes some pages fundamentally harder to convert on than others — before you even write a hypothesis?

Look it upSignificance in the glossary →