Why your A/B tests are not running long enough
Statistical significance is not optional. A test that has not reached significance is producing noise, not insight. Acting on that noise leads to changes that may be actively harmful.
The temptation to stop an A/B test early when one variant looks like a winner is completely understandable. The data looks clear. Why wait? The reason to wait is that early results in A/B tests are frequently misleading due to random variation, day-of-week effects, and the statistical phenomenon known as the peeking problem.
The peeking problem
Every time you check an ongoing test, you run the risk of finding a result that looks significant purely by chance. The more times you check, the higher the probability of seeing a false positive. This is why statistical validity requires committing to a sample size and duration before you look at results - not after you have already seen something interesting. In practice, most marketers check their tests daily and declare winners far too early.
Calculating minimum sample size
Before any test, use a sample size calculator to determine how many visitors each variant needs. Input your current conversion rate, the minimum effect size you want to detect, and your required confidence level (95 percent is standard). The output tells you exactly how many visitors you need before the test has any validity. If you cannot reach that number in four to eight weeks, your traffic level makes this test impractical.
The run duration rule
Even after you reach the required sample size, run the test for at least two full weeks to account for day-of-week variation. Conversion rates on Mondays are different from Saturdays. If your test runs only 10 days, you may not have captured the full weekly cycle. Two complete weeks is the minimum. Four is better for catching any seasonal variation in your audience behaviour.
When traffic is too low to test
Low-traffic sites cannot run meaningful A/B tests on conversion rate. The sample sizes required are too large and the time required is too long. In these situations, rely on best practice research, qualitative user research (session recordings, user surveys), and expert review rather than quantitative testing. Test where you have enough traffic to generate meaningful results. Do not run tests just to feel like you are doing conversion optimisation - act on your highest-confidence opportunities and test where statistical validity is achievable.
Found this useful?
Start a conversation - no pitch, no pressure.