oa2_20021021_bigOne of the great joys of direct response fundraising, at least to strange people like me (and maybe you), is the ability to test your assumptions and to strengthen the performance of a program one step at a time. But testing can be one of the most frustrating aspects of fundraising, too—especially when you go to the trouble to set up a test, track the results, and end up with no actionable conclusions when all the returns are in.

I’ve experienced both the joy and agony of direct response testing, and I would like to share examples of both with you. Of course, it’s important to remember that no test results can be universalized—results vary by organization, donor segments, and time of year.

A classic example of this is the test between Courier and Times New Roman fonts on a letter. I’ve seen this test performed on many occasions and often with highly significant results. The only problem is that, on those occasions when the test yields statistically significant results, half the time Courier wins, and half the time it’s Times New Roman—although there’s usually no statistically significant difference at all. In the end, the only value of this particular test may be showing that donors like something different. Or, even more likely, that it doesn’t matter at all.

One of the tests I’ve been most excited about recently was the test with a premium offer expanded from a one-panel buckslip to a full 8-1/2 x 11” sheet. The control package contained a standard buckslip offering prospective donors a free plush toy with their initial membership gift. Expanding the buckslip to a full sheet allowed the premium to be represented in life size, and the response rate to the package increased by 27%. However, at the same time, the more prominent premium offer lowered the average gift by $1. Both of these results were statistically significant, meaning it’s highly unlikely the observed differences in response between the two packages were due to chance.

So, in many ways this test did exactly what we wanted it to do. We modified the package in a way that caused a marked difference in donor behavior. We were able to inspire more donors to send a gift, but by doing that, we caused the value of the gift they offered to decline. Because this is an acquisition package, we cannot yet declare a winner. We need to monitor the long-term results to determine whether more donors giving a lower gift amount are more valuable than fewer donors giving a greater amount. But, because of this test, we now have a clearer idea of what influences response to the package, and we’ll be able to track the impact of that difference long term.

A test I loved—but one that failed—used faux stamps on an outer envelope. This is a package treatment I’ve seen used on many occasions, so it must have tested well for somebody. But my test was a complete flop. I tested placing two bright and colorful stickers on the outside of a nonprofit envelope as close to the indicia as the post office would allow. The idea was that the stamps would catch the donor’s eye and help overcome the junk mail stigma that can be attached to letters mailed at the nonprofit rate. Other than the faux stamps, I didn’t modify the control envelope at all. It was a plain, cream outer with no teasers or images, other than a name, logo, and address in the upper left-hand corner.

When I reviewed the returns, however, my affection for the fake stamps began to cool. There was less than 2/10ths of a percent difference in response rate between the version with the stamps and without—and no statistical difference between the average gifts. In all likelihood, the faux stamps didn’t change a single individual’s mind one way or the other about sending a gift. And, of course, the bright and colorful stamps added an extra measure of expense to the package that wasn’t counteracted by any increase in giving.

It’s okay to have a loser test every now and then, because it’s important to test your assumptions of what will work and what won’t. What’s most critical is that you create tests with a well thought-out rationale behind them and that the hypothesized result of the test is aligned with your overall goals. For example, it doesn’t make sense to test a reduced Ask amount to increase response, if your goal is to acquire higher value donors who will upgrade quickly.

In other words, each test you perform—and you should be testing as much as you can—should have the goal of moving your program to the next level. Just remember, it won’t be quick and easy, and you may receive as many muddy results as you do clear winners.

Peter Schoewe is a Vice President at Mal Warwick Donordigital.