Home / Field Notes / Conversion
Conversion

How to run A/B tests properly - most people get this wrong

Adil Jain| CRO| 2026-04-23

A/B testing is not just pressing a button and waiting for a winner. If your tests are not designed correctly, run long enough, and analysed properly, the results you act on may be statistically meaningless.

← Back to Field Notes

I have worked with clients who have run dozens of A/B tests and made decisions based on every one. When I look at the test setup, many of those decisions were made on data that had not reached statistical significance. The changes they implemented based on those tests may have improved performance. They may have made it worse. There was no valid way to know.

One variable at a time

An A/B test tests one variable. If you change the headline AND the button colour AND the form position at the same time, you do not know which change drove the result. This sounds obvious. In practice, the temptation to "make the page better" while you are in there is strong. Resist it. Change one thing. Measure the result. Then change the next thing.

Statistical significance is not optional

A test result is only meaningful when it reaches statistical significance - typically 95% confidence. This means you are 95% confident the observed difference is not due to random variation. To know when you will reach significance, you need to calculate your required sample size before you start. A test requires enough visitors AND enough conversions to produce meaningful data. Most tests on low-traffic websites are stopped before they have valid results.

Use a sample size calculator (there are several free ones online) before you run any test. Enter your current conversion rate, the minimum detectable effect you care about (how large a difference you want to be able to detect), and your traffic volume. The calculator tells you how many visitors your test needs. If you cannot get there in 4 to 8 weeks, the test is not viable for your traffic level.

Running for the right duration

Never stop a test just because it looks like a winner or a loser after a few days. Short-term results are misleading because day-of-week patterns, seasonal fluctuations, and random variation all affect early data. Run tests for a minimum of two full weeks - ideally four - to account for weekly variation. Do not peek at results daily and make decisions based on what you see. Set your duration and stick to it.

What to test first

Test the things with the biggest potential impact first. Headline - what you lead with matters most. Main CTA - button text, position, and colour. Social proof placement - where reviews and testimonials appear. Offer framing - how you describe what you are giving the visitor. These have larger effect sizes than minor design tweaks. Save small tests for later when you have exhausted the bigger opportunities.

Found this useful?

Start a conversation - no pitch, no pressure.