At Mal Warwick Donordigital, we love testing! However, we often see tests that are set up incorrectly or analyzed in the wrong way. This can lead to the wrong conclusion from a test, resulting in strategy changes that can hurt a program’s performance.

So how can you avoid making mistakes in your testing plans? We’ve put together a quick checklist of the most common errors we’ve seen in setting up and executing tests for direct response fundraising campaigns.

1. Don’t test more than one thing in an A/B test

This is by far the most common mistake we see when we are evaluating an organization’s testing history.  We’ve seen tests that are supposed to be determining what color to use on a donation form button—but guess what? The copy on the forms was also changed and a different image was used. If you want to test for a specific thing, you must only change that one element between the test and the control, otherwise, your results are unreliable. While there are ways to perform multivariate testing, you must use much more care in setting up the test and employ more advanced statistical techniques to evaluate the results.

2. Don’t force your control and test volumes to be the same size

For some reason, our profession has embraced the unshakeable myth that control quantities and test quantities must be the same volume. This is unnecessary from a statistical perspective, and it makes statistical significance harder to achieve by limiting the number of individuals who receive the control. The only requirement is that the test group is randomly split from the control. Repeat after us: my test and control groups do not need to be the same quantity.

3. Don’t guesstimate your sample sizes

This error can be dangerous in two ways. If you put too much volume into a test panel, you could be hurting the overall results of your campaign if the test fails. But if you don’t have enough volume, you could end up with unreadable results. Thankfully, there is a statistical method you can use to determine your test volumes based on the expected lift you would like to measure. And we’ve made it easy for you—you can determine the necessary test volume with a few clicks using the sample size estimator on our website.

4. Don’t assume that a test performed in one context will perform the same way in a different context

As fundraisers, we tend to build up a body of knowledge of what works and what doesn’t. And it’s easy to take the learnings from a test result from last summer and apply it to the upcoming year-end campaign—or to take a learning from a test in one channel and apply it to another. While the greater understanding we can gain about donor behaviors from rigorous testing is a good thing, it should only be used to form hypotheses of what to test next, not to create strict beliefs about what you think works or doesn’t work across different audiences, seasons, channels, or techniques. Yes, that means you’ll need to retest your donation form to optimize for email traffic versus web traffic.

5. Don’t eyeball your test results

In any test you perform, there is an element of chance. Statistical tests of significance help us to evaluate whether the difference we are seeing between the test and the control is due to chance variation—or due to the changes we made in the test. If you eyeball your test returns and don’t perform statistical significance tests on the metrics, you may be acting on a result that wasn’t reliable. We believe that a test result must reach 95% confidence to be reliable.  Luckily, we have a handy tool you can use to calculate the statistical significance of your test results.

Avoiding these testing pitfalls and blunders will enhance your testing strategies, your ability to interpret results, and benefit your fundraising program — ultimately enhancing your impact and moving your mission forward.

Peter Schoewe is a Senior Vice President & Director of Analytics at Mal Warwick Donordigital, providing direct response fundraising, advocacy and marketing services for nonprofits nationwide.