How to switch from two to two-dozen elements?

The simple two-variable test (see how to test many variables at once) reduces sample size by 50%, quantifies interactions between variables, and gives more accurate and robust information—all from just a single test cell and a structured test design.

However, a “full-factorial” test design of three variables requires twice the total number of recipes (eight instead of four). Testing four variables requires 16 and five variables require 32 recipes. For two to four variables these designs can be valuable, but with five or more, another option is needed.

Adding more test elements within the same number of recipes

More advanced scientific test designs retain the power of full-factorial designs, but with greater efficiency and fewer recipes. The 16 recipes required for testing all combinations of 4 variables, can instead be used to test up to 15 variables all at once. 

To learn the scientific steps and statistics behind the large test designs, please keep reading. If not, please refer to case studies and articles to see the real world application and results of multivariable testing.

Let’s start with a 3-element full-factorial design with three variables each at two levels:

A full-factorial test of all combinations gives the following matrix:

This matrix shows the eight recipes you need to create. Yet statistically, you can also add the four possible interactions—three 2-way and one 3-way interaction—that can be analyzed independently of these three main effects. The matrix below shows all of the effects:

The A, B, and C columns (in blue) are what your creative team uses to create the eight different e-mails (or catalogs, ads, store layouts). The AB, AC, BC, and ABC columns (in orange) are used for analyses. With 8 recipes, all 7 effects (the 7 columns) can be analyzed separately, so you can compare every possible combination of elements and interactions.

All effects are still calculated as the average of the “+” levels minus the average of the “–” levels in each column. And averaging the four recipes with a “+” in each column, all other columns have two pluses and two minuses, so every other effect averages out (the same as for the 2-element test).

However, three important statistical principles offer a way to test more variables:

  1. Effect Dissipation - Main effects tend to be larger than two-factor interactions and higher-order interactions are very unlikely.
  2. Effect Sparcity - Few, if any, interactions are usually significant.
  3. Effect Heredity - Interactions tend to result from large main effects.

All possible effects are seldom significant. A few variables may each have a big impact on their own. Changing two variables together may slightly alter these main effects. But changing all three together in the just the right combination will probably have little incremental impact versus the sum of the main effects and 2-way interactions.

Both the process of defining test elements and the nature of test elements support these principles. First of all, we work diligently to define independent test elements, so they can be changed on their own without affecting other elements. Therefore, well-defined test elements minimize the possibility of interactions. Secondly, large interactions are like the planets aligning—seldom does everything come together in just the right way to create a completely different outcome.

For the example, above, this means that the 3-way interaction (ABC) is probably very close to zero and at most one or two 2-way interactions may be significant, but even those are probably smaller than any main effects.

Now here’s a big step towards more advanced designs that took statisticians about a decade to take:

If an interaction is unlikely, then that column can often be better used to test another main effect.

Since the +/- combination in each column is unique, anything in that column can be analyzed independently of all other columns, so you can add another test element that follows the same +/- scheme.

For example, even without knowing anything about this e-mail marketing program, we can assume the ABC 3-way interaction will be non-existent or very close to zero. Therefore, we can add a 4th element into that column:

We can test a new element following the same +/- scheme as the ABC interaction. So along with price, offer, and copy, we can test a new subject line (in all recipes where ABC is “+”) versus the control (where ABC is “–”).

This simple change cuts the number of test recipes in half

Instead of using 16 recipes to test four variables (24), we can now use only eight. However, there is some risk. The 3-way interaction doesn’t just disappear. It’s now “confounded” with subject line (a fancy statistical term for “mixed together”).

The calculated effect from that last column is now the sum of both the main effect of “subject line” and the ABC interaction. Yet in reality, it’s no big deal, since the interaction is likely very close to zero. Any small change due to “confounding error” is usually much less than the normal experimental error from natural market variation. In other words, any error in the main effect of “subject line” is likely small and certainly worth a 50% reduction in the number of test recipes.

This type of test matrix is called a “fractional-factorial” design, since it requires just a fraction of the recipes required for a full-factorial test.

Taking this concept a step further, if interactions tend to be small or non-existent, then why not use each interaction column for another test element? Let’s do it…

If you place three additional test elements in the three columns with 2-way interactions, then you can create a matrix with 7 different variables within the same 8 recipes as the 3-element full-factorial test design:

Testing all combinations of 7 elements would require 128 recipes (27) versus the eight recipes above—quite significant savings, especially if you’re testing a catalog, print ad, TV spot, or direct mail package, where each recipe adds to your marketing costs. In addition, as for all scientific tests, sample size is unrelated to the number of test elements. So sample size for this 7-element test can be just as small as you would need to test 3 elements (or one new version against your control).

Confounding spreads a number of interactions throughout the matrix, but if the three statistical principles hold—few interactions exist and the interaction effects are related to and smaller than the significant main effects—then this test design works well.

Creating each test recipe—each version of the e-mail—now requires attention to all seven columns. For example, in recipe #1, the price, offer, copy, and subject line are kept the same as in the control (A, B, C, and G are “–”), but the starburst, more links, and more products are added (D, E, and F are set at the “+” level).

Recipe #2 requires A and G to be changed to the “+” level and D and E to be changed back to the control, and so on. When test elements are clearly defined, all recipes are basically cut-and-paste combinations of all the elements. Every element is present in every recipe, either at the plus or minus level.

This large test has enormous benefits over split-run tests:

  • The impact of each element can be accurately quantified independently of all other effects. Even though many variables are changed simultaneously, all main effects are independent.
  • Sample size is reduced by 80% versus seven different split-run tests
  • Key interactions can still be analyzed
  • The optimal result is often even greater than the best test recipe, since only a small portion of all possible combinations are tested. You can pinpoint the elements to change, those to keep at the control level, and those that make no difference.

This “fractional-factorial” scientific test design is just one of many options for testing more variables, more rapidly, at lower cost, and with more accurate and profitable results. The next section, variety of test designs and techniques, discusses a few important types of test designs and the best applications of each.

But remember—the statistics are just a tool. What you test, how you test it, and what you do with the information you learn is what determines your long-term success. You need a focused marketing objective, a good bunch of ideas, and some guidance in finding the best strategy to maximize your results.

Just as skill using a hammer does not mean you can build a house, basic knowledge of statistical testing does not translate into profitable testing. Thousands of details come together to determine the ultimate outcome. That’s where the LucidView strategy comes in. With the right statistical methods, a streamlined approach, and specialized skill integrating scientific testing within on-going marketing programs, LucidView consultants can help you hit the ground running.

Now that you understand the basics, glance over the variety of test designs and techniques, learn more about managing a scientific test, or look over some real-world examples in the case studies & articles. 

For the statisticians:

In reality, it’s important to use caution with this approach using fractional-factorial designs. As we saw with the 2-element test, interactions can be significant and valuable. Therefore, most tests are designed between the extremes—testing many elements with few recipes, but saving room to analyze the most-likely interactions.

Also, please note that, as is often the case, what looks so easy on the surface can have a lot of underlying complexity. Confounding creates a snowball effect, where interactions start popping up all over the place. For those who love matrix algebra…

When adding a new element, you also create other interactions. In the above example, when you confound “G: Subject line” with ABC gives:

 

  • All main effects confounded with 3-way interactions
        G = ABC
        A = BCG
        B = ACG
        C = ABC
     
  • 2-way interactions confounded with other 2-way interactions
        AB = CG
        AC = BG
        BC = AG

     

Adding three more elements, for a total of seven, the confounding scheme becomes:
D = AB, E = AC, F = BC, G = ABC

The overall defining relation then becomes:
​I = ABD = ACE = BCDE = BCF = ACDF = ABEF = DEF = ABCG = CDG = BEG = ADEG = AFG = BDFG = CEFG = ABCDEFG

This is shown not to push you away from scientific testing forever, but to show how much goes on in the background of every test. With guidance, you can ignore all of this stuff, but someone needs to understand it. Confounding offers immense opportunity to test many elements in few recipes and a few techniques can help minimize any confounding error. 

Ultimately, large test designs simply codify the complexity inherent in the marketplace, removing the blinders of opinion and unproven beliefs.

The immediate illumination of your marketing programs can be disconcerting at first. It’s so much easier to keep the blinders on. But the reality exposed in scientific testing simply mirrors the complex reality of your marketplace. The answers may not always be what you wanted to see, but they more accurately reflect the truth.

Every issue to consider in scientific testing is also an issue that should be considered with simple split-run tests. But with split-run testing, it’s so easy to overlook mistakes and the complexity of testing. One data point cannot show if interactions exist, or if you’re truly measuring what you wanted to test, or if the results have any relationship with future performance. Scientific testing forces you to ask the right questions and create an effective test… or else the results will show you the error of your ways.

If testing is important, then scientific testing should be an integral part of your marketing and advertising programs.