A/B testing sounds simple.
Well, yes – in theory. The thing is, in practice it’s never that straightforward. So here’s a plan for making sure you do tests the right way – while ensuring every test counts.
Start out with defining your objectives.
What do you want to get out of running the test? More website enquiries? More video plays? Increased sales?
Get your stakeholders to agree on what you want to achieve, and then print it out and stick it on your wall. This is your reference point, and something you’ll need to make sure your testing stays ultra-focused. Who knows, after you’ve had this discussion you might find A/B testing isn’t what you need.
Before you run your test, you need to know you can trust the results.
And that means checking your data. If you’re measuring anything on your website, you need to make sure you’re minimising the impact of bots. These can really screw up your figures, so make sure you exclude these. A typical bot will show up as a new visitor, spending less than a second on your site, and arriving via a non-standard route.
Here’s the thing: sometimes you won’t have enough data to get any conclusive results from testing.
Either you won’t see a major difference in tests (very common), or you don’t have a big enough sample size. To get an idea of the numbers you need, plug in your current stats (your traffic and current conversion rate) to a checker like http://www.evanmiller.org/ab-testing/sample-size.html.
Let’s say you want to test a new landing page.
Version A converts at 10% (1). Let’s say you run a Version B and you expect an improvement of a relative (3) 5% (2). Aim for a minimum 90% (4) statistical significance. This is the likelihood that any change is because of the test (90% likely), rather than by random chance (10% likely). So you need a sample size of 76,312 per variation.
If that sounds like a lot, then no problem. You have two options. One – create a Version B which converts far higher than Version A. Enter 50% into the “Minimum Detectable Effect” field and you’ll see you need a sample size of 817 per variation, so 1,634 altogether. Of course, that’s not always possible. Option Two – do some qualitative testing (see #4 below).
If you don’t have enough visitors, then carry out some user testing.
Recruit a sample group from your target audience and get them to interact with your website. You can find them on a place like https://www.usertesting.com – but the problem is these people are doing this as a job. They know they’re being asked to “test” your site, so they’re not in the mind-set of your average customer. That’s why if possible, ask some of your existing or potential clients. Consider reaching out via LinkedIn. It’s a great way to show you’re interested in improving customer experience, alongside promoting your brand.
Alongside this, install some software (Hotjar is a great place to start) that records how your visitors interact with your site. This gives you real insight from genuine users. You’ll see where people click (discover if your CTA buttons are prominent enough), whether they’re having problems filling out forms (learn whether to reduce the number of fields), how far down a page they scroll (pinpoint where to put the most important information). These insights can show you how a page is performing for individual people, and where you can make improvements.
Your customers will run on different business cycles, so your test needs to reflect their behavioural and purchasing patterns.
For example, if your customers usually purchase at the end of every quarter, let your tests run over at least one quarter.
If you’re an e-commerce website selling FMCG, the pace of purchasing means you might start getting insights at the end of one week (which should always be seven days, rather than Mon–Fri, 9–5). However for this example, given that many people get paid monthly it’s best to test over a 28-day cycle.
You also need to factor in seasonal trends, ie Christmas, summer holidays. If you’re running an A/B test, aim for 250–350 conversions per variant. Don’t be tempted to stop the test before this, even if you see a large difference between the two versions.
At this stage you’re confident in your historical data.
So it’s time to make predictions and plan your A/B test. To make sure you get the most out of your testing, and to keep everyone 100% focused, use this template to fit your test. If it doesn’t fit then go back and redevelop your hypothesis until it does:
If you’re doing A/B testing, it’s time to head back to http://www.evanmiller.org/ab-testing/sample-size.html and enter your results.
If you get statistical significance, then congratulations! If not, then use the calculator to see how much more data, or what level of conversion, you need. You can either re-run the test and combine the results, or go for the approach outlined in #4 above.
Don’t be disheartened if you don’t get a clear result – one of the great things about A/B testing is that you always learn. Even if the takeaway is that your change hasn’t much of a difference. You get a better idea of what to test next time. Remember what Thomas Edison said:
If you’re doing user testing, record all the comments from your testers. If it’s easier, put them in a spreadsheet. Now go through them, searching for common themes (eg concerns over price, usability, support). Group these together, and identify the common issues. You can then make changes based on the feedback.
Imagine you’ve got an established website, with branding and layout that your existing customers know. If you run an A/B test and start showing 50% of visitors a new homepage, you can expect a lot of exploration as they get to grips with the new version. Over time that will settle down. But you’ll need to factor in lots of extra clicking at the start.
Keep an eye on industry trends during your test. If you’re an e-commerce website participating in Black Friday, it’s likely you’ll see unnatural levels of traffic and purchasing behaviour. And if during your test your competitor goes bust, then you’ll need to monitor the impact on your own website.
Bear in mind a prospect who arrives via a PPC ad is likely to have different motivations (more likely to be ready to buy) than a prospect who’s arrived via a Google search (more likely to browse). So try to segment based on how your traffic is arriving, to understand how each type of user responds to your test.
Testing is a continuous process. In theory, you should keep testing until you get 100% conversion rate. But testing is about practice rather than theory. So it’s a case of being realistic.
If you’re testing areas of big revenue opportunities, say £1 million, then a 2% increase in conversions is going to be worth £20,000 a year. So calculate your funnels, and see where you’re leaking visitors and how much these leaks are worth. Then decide on your priorities.
Plus over time, your testing will reach a plateau. Eventually the time it takes to deliver a 1% lift will be more expensive than the increase in revenue. This is the “Local Maximum” stage. You’ve done as much as you can with your current version. You could run another hundred tests, but it’s impossible to improve by much.
If you’re at this stage, take a break – you’ve earned it!