Statistical significance in online-marketing

Dr. Torge Schmidt has a PhD in Mathematics and works as a Data Scientist at Akanoo. He is responsible for the development of new prediction models and statistical analysis.
He loves talking about statistics and significance, so feel free to reach out to him, if you have questions or simply want to discuss:

What is statistical significance and why do we need it ?

If you want to increase the performance of your website by introducing a new measure (like a discount campaign), you will want to confirm that this method is actually effective. The best way to do that is to split your traffic randomly into two parts A and B and compare the performance.

Let us assume that A stands for the old version and B stands for the version with the new measure enabled. After one day of analyzing the traffic we get the following data:

Version Number of Visitors Number of Conversions Conversion Rate (cr)
A 49 4 8.2%
B 51 5 9.8%

This data seems to indicate that the new version performs better than the old one, because crB– crA is greater than 0, i.e. crB> crA. But is this really enough to prove this thesis? Imagine that one more visitor arrives on version A and converts, then we get the following result:

Version Number of Visitors Number of Conversions Conversion Rate (cr)
A 50 5 10%
B 51 5 9.8%

Now version A seems to be better than version B. Therefore based on this data we can not reliably conclude that version A or B is better. Naturally we need more data to prove this! So let us assume we compare both versions for a longer time and get the following data:

Version Number of Visitors Number of Conversions Conversion Rate (cr)
A 5023 421 8.4%
B 5012 549 10.9%

Now we can say with high probability that version B is better than version A. But why is that? Could we not simply have had bad luck in our choice of who gets to see version A and version B? It is of course possible, but very unlikely. How unlikely it actually is, can be shown by applying a statistical significance test (hypothesis test).

There are two possible explanations for the observations in the table above:

  • Version B performs better than version A
  • Version B does not perform better than version A, we just had bad luck selecting the visitors

This leads to the question: How likely is it that the observations happened by chance?

Let us assume, for now, that the observations did in fact happen by chance and therefore both versions have the same conversion rate (8.4%), if we simply collect enough data.

We now repeat the experiment using 5000 visitors in every version, but instead of using real data, we just toss a coin to decide whether a visitor buys or not (well, not a 50/50 coin but a 91.6 / 8.4 coin). In the end we compute the conversion rate for both versions and take a look at the difference crB– crA.

We expect them to be 0, as both versions should perform the same (since they have the same conversion rates), but due to chance in the coin-tossing, we also get results that are bigger or lower than 0. If we repeat this multiple times and draw the results we arrive at the following chart, the distribution of crB– crA.

The X-Axis denotes the difference crB– crA.. The higher the curve, the more often crB– crA has the corresponding value. This chart is the basis for hypothesis testing.

We can see, based on this chart, how probable getting a certain value crB– crA is. The further we go away from the zero, the less likely it is to observe this result. If we go to the left, version A performs better, if we go to the right, version B performs better.
Now we know that values to the far right appear less often than in the middle, the probability of those appearing randomly is low. But when do we say that something is probable and something is improbable? There is no exact definition, so we have to create one ourselves. Let us say that we define the rightmost 5% of the graph as improbable, while the remaining 95% are probable:

At this point 95% of all cases lie on the left and 5% on the right.
This implies that 5 in 100 random experiments results in a difference value that lies in the green area.

The corresponding percentage, 5% (or 0.05), is called the significance value (often denoted as alpha) and it is normally set before starting an experiment. It defines how certain we want to be,  when we decide that version B performs better than version A.

Note: Obviously we could do the same on the left side but since the experiment indicated crB– crA>0 we do not test if the cr values are different, but instead if crB> crA or not.We now place our own value from our observation above in this chart (add 10.9%-8.4% = 2.5% or 0.025):

We see that our observation lies on the right side of our threshold,
therefore this result appears in less than 5% of all random experiments.

This means that it is very unlikely that our results above happened by chance.As a last example we present the same analysis using the first observation (i.e. only 50 visitors in each group and a smaller cr difference of 1.6%) using a significance value of 5%:

Note that our observation lies on the far left of our significance value and probably happened randomly.

Rule of thumb:

The smaller the significance value is, the harder it is to prove that two versions are different. This gets easier by either having more data or having a larger effect (i.e. difference in conversion rates).


Data Science Thesis at Akanoo

Ich bin Tillmann Radmer (26), studiere an der Humboldt Universität Berlin Wirtschaftsinformatik und habe zuvor meinen Bachelor als Wirtschaftsingenieur in Hamburg gemacht. In meiner Masterarbeit erforsche und vergleiche ich neuartige Ansätze zur Optimierung von Uplift-Modellen zur interaktiven Besucheransprache.
Durch ausgeklügeltere Modelle kann Akanoo Kunden gezielter ansprechen und die Conversion Rate für seine Kunden verbessern. Ich möchte hier einen kleine Einführung in das Uplift Modelling zur gezielten Besucheransprache geben.

Uplift Modeling

Often times an important question in direct marketing is which customers to target. This is because some customers will not react to the campaign, while still incurring the variable costs of marketing. Worse, some customers might react negatively to receiving an ad. Generally, there are four types of customers in direct marketing:

  1. Customers who will respond without treatment,
  2. customers who will only respond after receiving a treatment,
  3. customers who will not respond because of a treatment, and
  4. customers who will not respond regardless.

The goal of uplift modeling is to differentiate customers in group 2 from those in the other groups.

Start with A/B-testing

To build an uplift model one needs the result from an A/B-testing experiment. Using two groups and a binary outcome will give four possible combinations as shown below.

Figure 1: Four possibilities in an A/B-test experiment. (Shaar2016)

Given the result one can calculate the uplift as


Unfortunately, this term is not defined for a single customer, so we cannot simply build a model that maximizes it. Instead there are serveral approaches to maximize it indirectly.

Building a model

The simplest one neglects the control group and models the probability that a customer will return given that she received treatment and given some customer specific information.


Throwing away half the data isn’t optimal. For instance this model doesn’t predict which customers will only return when given some treatment, as we required above. Instead it will give all customers that will return regardless of treatment.

A simple extension is to build two models, one for each term in the uplift equation. Formally,


While this approach directly models the uplift effect, research suggests that it doesn’t always perform well in practice. One reason for that is that the treatment effect is usually much smaller than the main effect, e.g. if the main effect is 1.0% in the control group and the combined effect is 1.1% in the treatment group, then the uplift effect is only 0.1% (=1.1%-1.0%). Consequently the individual model will mostly focus on the main effect within each group. (Radcliffe, 2011)

An easy way to a single model approach is to apply a class variable transformation to the result from the A/B-test. Both the TR and the CNR groups will be assigned to the positive group (e.g. treatment) and TNR and CR will be assigned to the negative group (no treatment).

Table 1:  Example of the class variable transformation.







0=Not Returned

1 0 1 0
2 1 0 0
3 0 0 1
4 1 0 0
5 1 1 1
6 0 0 1

The idea is that from the four fields, we would definitely like to treat the customers in the group with the TR outcome. And because we don’t know whether the customers in the CNR would have responded if they had received a treatment, we would also like to treat them. It can be shown that under some assumptions modeling the conditional probability of the transformed class variable is equivalent to modeling the conditional probabilities of the two original variables from uplift equation above.

There are more elaborate techniques that incorporate the maximization of the uplift equation into the training algorithm. However, the straightforward implementation and applicability to any standard machine-learning algorithm make the two models approach and the class variable transformation approach very attractive first choices.

Lift analysis – A data scientist’s secret weapon

Whenever I read articles about data science I feel like there is some important aspect missing: evaluating the performance and quality of a machine learning model.

There is always a neat problem at hand that gets solved and the process of data acquisition, handling and model creation is discussed, but the evaluation aspect too often is very brief. But I truly believe it’s the most important fact, when building a new model. Consequently, the first post on this blog will deal with a pretty useful evaluation technique: lift analysis.

Machine learning covers a wide variety of problems like regression and clustering. Lift analysis, however, is used for classification tasks. Therefore, the remainder of this article will concentrate on these kind of models.

The reason behind lift charts

When evaluating machine learning models there is a plethora of possible metrics to assess performance. There are things like accuracy, precision-recall, ROC curve and so on. All of them can be useful, but they can also be misleading or don’t answer the question at hand very well.

Accuracy1 for example might be a useful metric for balanced classes (that is, each label has about the same number of occurrences), but it’s totally misleading for imbalanced classes. Problem is: data scientists have to deal with imbalanced classes all the time, e.g. when predicting if a user will buy something in an online shop. If only 2 out of 100 customers buy anyway, it’s easy for the model to predict everyone as not buying and it still would achieve an accuracy of 98%! That’s absolutely not useful, when trying to assess the model’s quality.

Of course, other metrics like precision and recall give you important information about your model as well. But I want to dig a bit deeper into another valuable evaluation technique, generally referred to as lift analysis.

To illustrate the idea, we’ll consider a simple churn model: we want to predict if a customer of an online service will cancel its subscription or not. This is a binary classification problem: the user either cancels the subscription (churn=1) or keeps it (churn=0).

The basic idea of lift analysis is as follows:

  1. group data based on the predicted churn probability (value between 0.0 and 1.0). Typically, you look at deciles, so you’d have 10 groups: 0.0 – 0.1, 0.1 – 0.2, …, 0.9 – 1.0
  2. calculate the true churn rate per group. That is, you count how many people in each group churned and divide this by the total number of customers per group.

Why is this useful?

The purpose of our model is to estimate how likely it is that a customer will cancel its subscription. This means our predicted (churn) probability should be directly proportional to the true churn probability, i.e. a high predicted score should correlate with a high actual churn rate. Vice versa, if the model predicts that a customer won’t churn, then we want to be sure that it’s really unlikely that this customer will churn.

But as always, a picture is worth thousand words. So let’s see how an ideal lift chart would look like:

Lift Chart

Here you can see that the churn rate in the rightmost bucket is highest, just as expected. For scores below 0.5, the actual churn rate in the buckets is almost zero. You can use this lift chart to verify that your model is doing what you expect from it.

Let’s say there would be a spike in the lower scored groups; then you know right away that your model has some flaw, it doesn’t reflect the reality properly. Because if it would, then the true churn rate can only decrease with decreasing score. Of course, lift analysis can help you only that far. It’s up to you to identify the cause of this problem and to fix it, if necessary2. After improving the model, you just can come back to the lift chart and see if the quality improved.

Additionally, I drew a black line for the hypothetical average churn rate (20%). This is useful to define a targeting threshold: scores below the threshold will be set to 0, scores above to 1. In our example, you might want to try to keep customers from cancelling their subscription by giving them a discount. Then you would target all users with a score between 0.8 and 1.0, because this is the range where the churn rates are higher than the average churn rate. You don’t want to pour money down the drain for customers, who have a below-average churn probability.

But what is lift exactly?

Until now, we only looked at nice charts. But usually you’re interested in the lift score as well. The definition is pretty simple:

\displaystyle lift = \frac{predicted\ rate}{average\ rate}


rate in our situation refers to the churn rate, but might as well be a conversion rate, response rate etc.

Looking back at our example chart, the highest group would have a lift of 0.97 / 0.2 = 4.85 and the second highest group of 1.8. That means, if you only target users with a score higher than 0.9, you can expect to catch nearly five times more churning users than you would by targeting the same number of people randomly.


Just like every other evaluation metric lift charts aren’t an one-off solution. But they help you get a better picture of the overall performance of your model. You can quickly spot flaws, if the slope of the lift chart is not monotonic. Additionally, it helps you to set a threshold, which users are worth targeting. Last but not least, you have a estimate how much better you can target users compared to random targeting.

I hope this first blog post gave you some new insights or you enjoyed it as a refresher. If you have any questions or feedback, just leave a comment or shoot me a tweet.

  1. Ratio of correctly labeled observations to total number of observations.
  2. There might be cases where this does not matter, e.g. when your main goal is to target everyone who churns, but it doesn’t matter, if you also target some people who won’t churn.

(First published on

Cart Abandonment Can Cost You More Than 50% Of Your Online Shop Revenue

Shopping Handbags
Online shoppers often add items to the basket without buying them.

Online shoppers generally add items to the shopping cart within a short time after having arrived on the shop website. Nonetheless many of them leave the shop without buying. Among some of our clients more than 50 % of the visitors start their visit with a full basket. They have obviously stopped the purchase before. This is a remarkably high amount and much revenue lost if they never complete their purchase.

Other online shops experience an even worse situation. The Research Institute Baymard has collected shopping cart abandonment case studies since 2006. The result: The average cart abandonment rate is close to 70 %. The interesting questions are: Why do online shoppers abandon carts? And what can I do as a retailer to prevent it?

The 10 Most Common Reasons For Shopping Cart

A Many reasons for cart abandoned have been identified. A list of the top reasons was recently published as part of the UPS Pulse of the Online Shopper study. In cooperation with the internet analytics company comScore UPS interviewed 5,000 online shoppers and discovered the 10 most common reasons for abandoned carts:

  1. Shipping costs made the total purchase costs more than expected (56%)
  2. My order value wasn’t large enough to qualify for free shipping (45%)
  3. I was not ready to purchase, but wanted to get an idea of the total cost with delivery for comparison (44%)
  4. I was not ready to purchase, but wanted to save the cart for later (43%)
  5. The item was out of stock (42%)
  6. Shipping and handling costs were listed too late during the checkout process (34%)
  7. I needed the product within a certain time frame and the shipping options offered didn’t meet my requirements (28%)
  8. I didn’t want to register/create an account just to make a purchase (27%)
  9. The estimated shipping time was too long for the amount I was willing to pay (26%)
  10. My preferred payment (i.e. bank transfer, debit card, PayPal, Google Checkout) was not offered. (24%)

Solutions For Shopping Cart Abandonment

Let’s have a closer look at the results and what you can do about it. We find four main categories to explain abandonment. The good news is, three categories can be easily dealt with by optimizing your ordering process:  

  • Check-out

The first optimization area is the check-out process. The surveyed customers complained about hidden costs. Create a transparent checkout process and list all accruing charges directly at the beginning. The online shoppers shouldn’t get annoyed by unpleasant surprises at the end of the purchase process.

Another pain point was the obligation to create an account. Offer the possibility to buy your products without an account and make the purchase process as comfortable as possible.

  • Delivery

Online shoppers frequently mentioned the importance of a quick and comfortable shipping process. Guarantee the direct availability of your products and offer the most common shipping options.

Make sure to communicate the guaranteed shipping date and available options prominently on your site. Your potential customers have to find this information at a glance.

  • Pricing

Pricing is an important factor, when it comes to the final check-out. Especially, high shipping costs keep a lot of online shoppers from buying. Offer free shipping or keep the shipping costs as low as possible.

Card Abandonment in the areas of check-out, delivery and pricing can be tackled relatively easy. But there is still one category left, that needs more consideration.

  • Buying Intention

A lot of the interviewed online shoppers mentioned, that they haven’t been ready to buy. They just browsed the shop to look for more information about the product or the shipping. This means these online shoppers are an enormous chance to boost your sales. In general, they are interested in your product and services but they do need a very personal interaction to make sure that they will turn into customers.

Identify And Persuade Relevant Online Shopper

All you need to do is convince them that you have the products for their needs. Easier said than done. How can you identify the promising customer group? And what do you have to do to persuade them? In general, there are two ways to attract them.

The first one is to contact them after they have already left the online shop. A lot of companies try to address their online shop visitors via retargeting. They mark every visitor with a special cookie and show them online ads on other website or on social media. Another solutions is to address them by e-mail. If your visitors have created an account or signed up for an newsletter, it is possible to address them with personalized mailings afterwards.

Personalized campaigns can help you to persuade undecided online shoppers.
Personalized campaigns can help you to persuade undecided online shoppers.

The second solutions is to address online shoppers directly on the online shop while they are seeking for relevant information. It is possible to distinguish between buyers and non-buyers by using an algorithm, called Random Forest. In this way, you can show personalized campaigns only to users, who would not have bought otherwise.

For example, the interviewed people mentioned, they had been looking for the total costs and shipping fees. Why not offering them a voucher for free shipping or a product discount? A lot people mentioned that they just want to learn about the products but necessarily purchase them at your particular shop. Why not highlight additional benefits of purchasing at your shop or offering them free give-a-ways, if they order the product directly?

To sum it up: shopping cart abandonment is a huge problem for online shops, but you can do something about it. It’s your turn. Start optimizing your online shop today and boost your revenue. The following checklist, will help you reach your goal.

Checklist: 6 tips to avoid shopping cart abandonment
Checklist: 6 tips to avoid shopping cart abandonment

If you need help with running personalized campaigns on your online shop, get in contact now. Akanoo engages online-shop visitors with effective campaigns, while they are still surfing the shop site. Unhappy visitors and abandoned carts are avoided and revenues increased.

How To Predict Purchase Probability With Random Forest Models


Will the user abandon my shop without purchasing? This is one of the most common questions every shop manager is facing. A lot companies are collecting and analysing huge amounts of user data (click behavior, previous purchases etc.) to answer it. Did you ever wonder how you can predict purchasing probability while the visitor is still surfing the shop site? First you need to collect the right user data. Finally smart algorithms like Random Forests provide you with valid predictions who will buy or not.

Steps For Predicting Purchase Probability
Steps For Predicting Purchase Probability

Multi-staged Questions Help To Identify Buyers

It is necessary to know about Decision Trees to understand Random Forest algorithms. A Decision Tree is a group of questions you ask to get to a conclusion step-by-step. In case of our purchasing probability example you could start with “Do we know the customer?” followed by further questions like “Has the user viewed more than 3 products?”, “Does the visit last longer than 5 minutes?”, “Are already products in the basket?” and many more. In the end you classify the visitor as a buyer or a non-buyer for every single branch of the tree. The graphic below shows the principle of a Decision Tree analysis. Of course, in reality the analysis will depend on a variety of additional factors. For example, Akanoo uses combinations of over 50 independent variables to calculate purchasing probabilities.

Simplified Graphic For A Decision Tree Analysis

How Can I Use Random Forest To Increase My Online Shop Revenue?

Although it looks like a great tool to predict probabilities, there is one essential problem: Overfitting. Questions for a Decision Tree are created with the help of a training data set. For this training data set the predictions are very reliable. But if you are trying to generalize and adapt it to new data sets, the predictions aren’t that good anymore. The Decision Tree is adapted too much to the initial training data set – it is overfitting.

To avoid it, you can use a combination of different trees – a Random Forest – and built an average value. Usually a Random Forest consists of about hundreds of different Decision Trees and supplies more precise, applicable results to new data.

Apart from predicting the purchase probability, it is useful for a wide range of E-Commerce questions, e.g. regarding:

  • Sales increase: Will the visitor add a second product to the shopping cart?
  • Sales decrease: Will the visitor remove a product from the shopping cart?

Finding answers to questions like this will be very helpful for optimizing your online shop. For example, Akanoo uses the prediction results to show users personalized campaigns highlighting top sellers, giving individual coupons or guiding the visitor to more relevant selections of products to increase revenues and profits of online shops.

If you have any questions, how your online shop can profit from using Random Forest and personalized incentives, send us a message. We are happy to help you.

Business Opportunities in Click Stream Data Mining – from Cart Abandonment Prevention to Upselling

When we started Akanoo 2½ years ago, none of us would have believed that the click-stream of online shop visitors reveals so many applications of statistical analysis. Our goal was simple: use JavaScript to display a voucher to people on the first pages if you are certain they otherwise won’t buy. A logistic regression on the target variable “will buy” and a web service written in Groovy for real-time prediction did the job.

This led to roughly 10-20% of incremental revenue compared to control groups. But only on a small group of people: those that we knew for sure won’t buy.

Starting with cart abandonment prevention, we have identified different strategies to make additional revenue with statistical models. What will be the next strategy? We’re working on it.

So, we took a deeper look at the behavior of the millions of online shop visitors that we had tracked so far. And we quickly realized: there is a business opportunity way bigger than that. We saw visitors responding negatively to vouchers and some of them becoming less profitable in the long run compared to the control group. So we had to rethink our initial solution.

Let’s draw an analogy here: most online retailers offer the same assistance as a supermarket – none. However, there are millions of visitors out there that rather like the shopping experience of a fashion outlet or shoe retailer (they know from the offline world). There is staff around that guides you through the offering. How can we deliver both at the same time?

The solution is: create algorithms that have a lot more freedom. We started working on algorithms that are able to predict the intention and next steps of visitors and pick the right way of interacting with these visitors from an array of different incentives and pieces of information. With the ultimate goal: converting visitors into happy customers.

Now we’ve expanded the audience of our JavaScript targeting to a wide array of scenarios besides cart abandonment prevention: up-selling, inspiration, return reduction, loyalty. (And we started using Spark, Hadoop, Docker, Redis and Akka to handle the data…)

What’s next? The data we analyze and track on a daily basis is growing steadily. Just for the up-selling models, we look at the data of 11M users every night. Other kinds of transaction-based websites come into play and modeling of more complex decision making processes.

Who makes this possible? The data science team behind Akanoo.
Yours, Fabian

Akanoo bridges theory and practice

Last Thursday, our co-founder Jan-Paul Lüdtke, an alumnus of the Institute of Innovation Marketing at Hamburg University of Technology, gave a talk in a Marketing lecture on our venture Akanoo. After explaining the business model and how it relates to the topics of the lecture, he asked students to give their thoughts on some of the business challenges that Akanoo faces.
The answers were very insightful. Jan-Paul and the whole Akanoo team would like to thank everyone for participating.