Statistical significance in online-marketing

Dr. Torge Schmidt has a PhD in Mathematics and works as a Data Scientist at Akanoo. He is responsible for the development of new prediction models and statistical analysis.
He loves talking about statistics and significance, so feel free to reach out to him, if you have questions or simply want to discuss: torge@akanoo.com

What is statistical significance and why do we need it ?

If you want to increase the performance of your website by introducing a new measure (like a discount campaign), you will want to confirm that this method is actually effective. The best way to do that is to split your traffic randomly into two parts A and B and compare the performance.

Let us assume that A stands for the old version and B stands for the version with the new measure enabled. After one day of analyzing the traffic we get the following data:

Version Number of Visitors Number of Conversions Conversion Rate (cr)
A 49 4 8.2%
B 51 5 9.8%

This data seems to indicate that the new version performs better than the old one, because crB– crA is greater than 0, i.e. crB> crA. But is this really enough to prove this thesis? Imagine that one more visitor arrives on version A and converts, then we get the following result:

Version Number of Visitors Number of Conversions Conversion Rate (cr)
A 50 5 10%
B 51 5 9.8%

Now version A seems to be better than version B. Therefore based on this data we can not reliably conclude that version A or B is better. Naturally we need more data to prove this! So let us assume we compare both versions for a longer time and get the following data:

Version Number of Visitors Number of Conversions Conversion Rate (cr)
A 5023 421 8.4%
B 5012 549 10.9%

Now we can say with high probability that version B is better than version A. But why is that? Could we not simply have had bad luck in our choice of who gets to see version A and version B? It is of course possible, but very unlikely. How unlikely it actually is, can be shown by applying a statistical significance test (hypothesis test).

There are two possible explanations for the observations in the table above:

  • Version B performs better than version A
  • Version B does not perform better than version A, we just had bad luck selecting the visitors

This leads to the question: How likely is it that the observations happened by chance?

Let us assume, for now, that the observations did in fact happen by chance and therefore both versions have the same conversion rate (8.4%), if we simply collect enough data.

We now repeat the experiment using 5000 visitors in every version, but instead of using real data, we just toss a coin to decide whether a visitor buys or not (well, not a 50/50 coin but a 91.6 / 8.4 coin). In the end we compute the conversion rate for both versions and take a look at the difference crB– crA.

We expect them to be 0, as both versions should perform the same (since they have the same conversion rates), but due to chance in the coin-tossing, we also get results that are bigger or lower than 0. If we repeat this multiple times and draw the results we arrive at the following chart, the distribution of crB– crA.

The X-Axis denotes the difference crB– crA.. The higher the curve, the more often crB– crA has the corresponding value. This chart is the basis for hypothesis testing.

We can see, based on this chart, how probable getting a certain value crB– crA is. The further we go away from the zero, the less likely it is to observe this result. If we go to the left, version A performs better, if we go to the right, version B performs better.
Now we know that values to the far right appear less often than in the middle, the probability of those appearing randomly is low. But when do we say that something is probable and something is improbable? There is no exact definition, so we have to create one ourselves. Let us say that we define the rightmost 5% of the graph as improbable, while the remaining 95% are probable:

At this point 95% of all cases lie on the left and 5% on the right.
This implies that 5 in 100 random experiments results in a difference value that lies in the green area.

The corresponding percentage, 5% (or 0.05), is called the significance value (often denoted as alpha) and it is normally set before starting an experiment. It defines how certain we want to be,  when we decide that version B performs better than version A.

Note: Obviously we could do the same on the left side but since the experiment indicated crB– crA>0 we do not test if the cr values are different, but instead if crB> crA or not.We now place our own value from our observation above in this chart (add 10.9%-8.4% = 2.5% or 0.025):

We see that our observation lies on the right side of our threshold,
therefore this result appears in less than 5% of all random experiments.

This means that it is very unlikely that our results above happened by chance.As a last example we present the same analysis using the first observation (i.e. only 50 visitors in each group and a smaller cr difference of 1.6%) using a significance value of 5%:

Note that our observation lies on the far left of our significance value and probably happened randomly.

Rule of thumb:

The smaller the significance value is, the harder it is to prove that two versions are different. This gets easier by either having more data or having a larger effect (i.e. difference in conversion rates).