Christmas shoppers behaviour differs in online shop industry

anatomie-des-onlineshoppers-adaptions-5The closer Christmas is coming, the more people shop online. We don’t share a secret telling you that Christmas is the most lucrative time of the year. We asked how many additional visitors and revenue online shops really generate during the Christmas season.

Traffic increase for all branches

Our Akanoo Data Insights tool provides relevant benchmarks for online retailers to evaluate the performance of their shops and enables us, to dive deep into the Christmas season and the performances of different branches. A comparison of the yearly average with the figures of December clearly shows that many online shops have a significantly higher amount of shop visitors. On average, online shops show a traffic increase of 123.0%.

Thus, a closer look at different industries like fashion, DIY and consumer goods shows that the traffic increase differs between the different portfolios. In the fashion segment the traffic amount climbs up by 97.1 %. DIY shops show a rise of 112.2% in visits. Compared to fashion and DIY can shops increase their visits with consumer goods by remarkable additional 169.4% visitors.

Conversion increase differs massively

While higher traffic is often going with weaker conversion rates, the additional Christmas traffic shows even a higher conversion rate. On average, the conversion rate increases by 15.4 % during the Christmas season. Comparing fashion, DIY and consumption we recognize the following: Fashion online shops create 4% more conversions in the Christmas season. Shops which offer consumer goods show an increase of 8%. DIY shops are one of the segments which realise double-digit growth of 14.3 % in conversions.

Our benchmark shows the enormous potential for online shops to generate additional revenue. In order to make the best use of the Christmas traffic, it is necessary to evaluate permanently the shop performance and to compare it to industry and product benchmarks.

We support you in identifying the best optimisation areas for your online shop. Contact us today.

 

Meet Our Data Scientist Gundula

employee_interview_gundula

Gundula works as a data scientist at Akanoo. She implements new features for our predictive models. Furthermore, she spents a lot of time on preparing and evaluating experiments and monitoring the nightly training cycles.

How long have you been working for Akanoo and what have you done before that?

I started working with Akanoo in November 2014. Before, I did a PhD in computational neuroscience: My research was about the processing of auditory information of calling songs in the brain of grasshoppers.

What do you enjoy most about your work?

I enjoy the vast amount of data that we analyze.

And what do you usually do after work?

On my way back home, I enjoy the Außenalster!

Which colleague would you take with you to a lonely island and why?

I would take Carole, because she is a great swimmer.

Which attraction/part of Hamburg do you think you should definitely visit and why?

This summer, I discovered Wilhelmsburger Inselpark, though it looks quite artificial. Altes Land is lovely, too, especially during spring when the apple trees bloom.

Which apps could you never do without and why?

Spotify.

These blogs/websites belong to my daily reading:

I read a lot about vegetarian cooking!

On the web you can find me here:

Try your favorite search engine.

Lift analysis – A data scientist’s secret weapon

Whenever I read articles about data science I feel like there is some important aspect missing: evaluating the performance and quality of a machine learning model.

There is always a neat problem at hand that gets solved and the process of data acquisition, handling and model creation is discussed, but the evaluation aspect too often is very brief. But I truly believe it’s the most important fact, when building a new model. Consequently, the first post on this blog will deal with a pretty useful evaluation technique: lift analysis.

Machine learning covers a wide variety of problems like regression and clustering. Lift analysis, however, is used for classification tasks. Therefore, the remainder of this article will concentrate on these kind of models.

The reason behind lift charts

When evaluating machine learning models there is a plethora of possible metrics to assess performance. There are things like accuracy, precision-recall, ROC curve and so on. All of them can be useful, but they can also be misleading or don’t answer the question at hand very well.

Accuracy1 for example might be a useful metric for balanced classes (that is, each label has about the same number of occurrences), but it’s totally misleading for imbalanced classes. Problem is: data scientists have to deal with imbalanced classes all the time, e.g. when predicting if a user will buy something in an online shop. If only 2 out of 100 customers buy anyway, it’s easy for the model to predict everyone as not buying and it still would achieve an accuracy of 98%! That’s absolutely not useful, when trying to assess the model’s quality.

Of course, other metrics like precision and recall give you important information about your model as well. But I want to dig a bit deeper into another valuable evaluation technique, generally referred to as lift analysis.

To illustrate the idea, we’ll consider a simple churn model: we want to predict if a customer of an online service will cancel its subscription or not. This is a binary classification problem: the user either cancels the subscription (churn=1) or keeps it (churn=0).

The basic idea of lift analysis is as follows:

  1. group data based on the predicted churn probability (value between 0.0 and 1.0). Typically, you look at deciles, so you’d have 10 groups: 0.0 – 0.1, 0.1 – 0.2, …, 0.9 – 1.0
  2. calculate the true churn rate per group. That is, you count how many people in each group churned and divide this by the total number of customers per group.

Why is this useful?

The purpose of our model is to estimate how likely it is that a customer will cancel its subscription. This means our predicted (churn) probability should be directly proportional to the true churn probability, i.e. a high predicted score should correlate with a high actual churn rate. Vice versa, if the model predicts that a customer won’t churn, then we want to be sure that it’s really unlikely that this customer will churn.

But as always, a picture is worth thousand words. So let’s see how an ideal lift chart would look like:

Lift Chart

Here you can see that the churn rate in the rightmost bucket is highest, just as expected. For scores below 0.5, the actual churn rate in the buckets is almost zero. You can use this lift chart to verify that your model is doing what you expect from it.

Let’s say there would be a spike in the lower scored groups; then you know right away that your model has some flaw, it doesn’t reflect the reality properly. Because if it would, then the true churn rate can only decrease with decreasing score. Of course, lift analysis can help you only that far. It’s up to you to identify the cause of this problem and to fix it, if necessary2. After improving the model, you just can come back to the lift chart and see if the quality improved.

Additionally, I drew a black line for the hypothetical average churn rate (20%). This is useful to define a targeting threshold: scores below the threshold will be set to 0, scores above to 1. In our example, you might want to try to keep customers from cancelling their subscription by giving them a discount. Then you would target all users with a score between 0.8 and 1.0, because this is the range where the churn rates are higher than the average churn rate. You don’t want to pour money down the drain for customers, who have a below-average churn probability.

But what is lift exactly?

Until now, we only looked at nice charts. But usually you’re interested in the lift score as well. The definition is pretty simple:

Latex formula

 

rate in our situation refers to the churn rate, but might as well be a conversion rate, response rate etc.

Looking back at our example chart, the highest group would have a lift of 0.97 / 0.2 = 4.85 and the second highest group of 1.8. That means, if you only target users with a score higher than 0.9, you can expect to catch nearly five times more churning users than you would by targeting the same number of people randomly.

Conclusion

Just like every other evaluation metric lift charts aren’t an one-off solution. But they help you get a better picture of the overall performance of your model. You can quickly spot flaws, if the slope of the lift chart is not monotonic. Additionally, it helps you to set a threshold, which users are worth targeting. Last but not least, you have a estimate how much better you can target users compared to random targeting.

I hope this first blog post gave you some new insights or you enjoyed it as a refresher. If you have any questions or feedback, just leave a comment or shoot me a tweet.

  1. Ratio of correctly labeled observations to total number of observations.
  2. There might be cases where this does not matter, e.g. when your main goal is to target everyone who churns, but it doesn’t matter, if you also target some people who won’t churn.

(First published on datalifebalance.com)

Business Opportunities in Click Stream Data Mining – from Cart Abandonment Prevention to Upselling

When we started Akanoo 2½ years ago, none of us would have believed that the click-stream of online shop visitors reveals so many applications of statistical analysis. Our goal was simple: use JavaScript to display a voucher to people on the first pages if you are certain they otherwise won’t buy. A logistic regression on the target variable „will buy“ and a web service written in Groovy for real-time prediction did the job.

This led to roughly 10-20% of incremental revenue compared to control groups. But only on a small group of people: those that we knew for sure won’t buy.

2015-09-29-potential-by-trigger-moment
Starting with cart abandonment prevention, we have identified different strategies to make additional revenue with statistical models. What will be the next strategy? We’re working on it.

So, we took a deeper look at the behavior of the millions of online shop visitors that we had tracked so far. And we quickly realized: there is a business opportunity way bigger than that. We saw visitors responding negatively to vouchers and some of them becoming less profitable in the long run compared to the control group. So we had to rethink our initial solution.

Let’s draw an analogy here: most online retailers offer the same assistance as a supermarket – none. However, there are millions of visitors out there that rather like the shopping experience of a fashion outlet or shoe retailer (they know from the offline world). There is staff around that guides you through the offering. How can we deliver both at the same time?

The solution is: create algorithms that have a lot more freedom. We started working on algorithms that are able to predict the intention and next steps of visitors and pick the right way of interacting with these visitors from an array of different incentives and pieces of information. With the ultimate goal: converting visitors into happy customers.

Now we’ve expanded the audience of our JavaScript targeting to a wide array of scenarios besides cart abandonment prevention: up-selling, inspiration, return reduction, loyalty. (And we started using Spark, Hadoop, Docker, Redis and Akka to handle the data…)

What’s next? The data we analyze and track on a daily basis is growing steadily. Just for the up-selling models, we look at the data of 11M users every night. Other kinds of transaction-based websites come into play and modeling of more complex decision making processes.

Who makes this possible? The data science team behind Akanoo.
Yours, Fabian