Statistical significance in online-marketing

Dr. Torge Schmidt has a PhD in Mathematics and works as a Data Scientist at Akanoo. He is responsible for the development of new prediction models and statistical analysis.
He loves talking about statistics and significance, so feel free to reach out to him, if you have questions or simply want to discuss: torge@akanoo.com

What is statistical significance and why do we need it ?

If you want to increase the performance of your website by introducing a new measure (like a discount campaign), you will want to confirm that this method is actually effective. The best way to do that is to split your traffic randomly into two parts A and B and compare the performance.

Let us assume that A stands for the old version and B stands for the version with the new measure enabled. After one day of analyzing the traffic we get the following data:

Version Number of Visitors Number of Conversions Conversion Rate (cr)
A 49 4 8.2%
B 51 5 9.8%

This data seems to indicate that the new version performs better than the old one, because crB– crA is greater than 0, i.e. crB> crA. But is this really enough to prove this thesis? Imagine that one more visitor arrives on version A and converts, then we get the following result:

Version Number of Visitors Number of Conversions Conversion Rate (cr)
A 50 5 10%
B 51 5 9.8%

Now version A seems to be better than version B. Therefore based on this data we can not reliably conclude that version A or B is better. Naturally we need more data to prove this! So let us assume we compare both versions for a longer time and get the following data:

Version Number of Visitors Number of Conversions Conversion Rate (cr)
A 5023 421 8.4%
B 5012 549 10.9%

Now we can say with high probability that version B is better than version A. But why is that? Could we not simply have had bad luck in our choice of who gets to see version A and version B? It is of course possible, but very unlikely. How unlikely it actually is, can be shown by applying a statistical significance test (hypothesis test).

There are two possible explanations for the observations in the table above:

  • Version B performs better than version A
  • Version B does not perform better than version A, we just had bad luck selecting the visitors

This leads to the question: How likely is it that the observations happened by chance?

Let us assume, for now, that the observations did in fact happen by chance and therefore both versions have the same conversion rate (8.4%), if we simply collect enough data.

We now repeat the experiment using 5000 visitors in every version, but instead of using real data, we just toss a coin to decide whether a visitor buys or not (well, not a 50/50 coin but a 91.6 / 8.4 coin). In the end we compute the conversion rate for both versions and take a look at the difference crB– crA.

We expect them to be 0, as both versions should perform the same (since they have the same conversion rates), but due to chance in the coin-tossing, we also get results that are bigger or lower than 0. If we repeat this multiple times and draw the results we arrive at the following chart, the distribution of crB– crA.

The X-Axis denotes the difference crB– crA.. The higher the curve, the more often crB– crA has the corresponding value. This chart is the basis for hypothesis testing.

We can see, based on this chart, how probable getting a certain value crB– crA is. The further we go away from the zero, the less likely it is to observe this result. If we go to the left, version A performs better, if we go to the right, version B performs better.
Now we know that values to the far right appear less often than in the middle, the probability of those appearing randomly is low. But when do we say that something is probable and something is improbable? There is no exact definition, so we have to create one ourselves. Let us say that we define the rightmost 5% of the graph as improbable, while the remaining 95% are probable:

At this point 95% of all cases lie on the left and 5% on the right.
This implies that 5 in 100 random experiments results in a difference value that lies in the green area.

The corresponding percentage, 5% (or 0.05), is called the significance value (often denoted as alpha) and it is normally set before starting an experiment. It defines how certain we want to be,  when we decide that version B performs better than version A.

Note: Obviously we could do the same on the left side but since the experiment indicated crB– crA>0 we do not test if the cr values are different, but instead if crB> crA or not.We now place our own value from our observation above in this chart (add 10.9%-8.4% = 2.5% or 0.025):

We see that our observation lies on the right side of our threshold,
therefore this result appears in less than 5% of all random experiments.

This means that it is very unlikely that our results above happened by chance.As a last example we present the same analysis using the first observation (i.e. only 50 visitors in each group and a smaller cr difference of 1.6%) using a significance value of 5%:

Note that our observation lies on the far left of our significance value and probably happened randomly.

Rule of thumb:

The smaller the significance value is, the harder it is to prove that two versions are different. This gets easier by either having more data or having a larger effect (i.e. difference in conversion rates).

 

How to run JavaScript QUnit Tests upon Deployment using Jenkins & PhantomJS

Check out the first part of the JavaScript Testing Series: Unit Testing Self-Invoking JavaScript Functions

Unit Testing is great. However, the real benefit of unit testing is only achieved when the tests are run before each deployment. In continuous integration (CI) it makes sense to run the tests automatically in the CI tool.

At Akanoo we are using Jenkins which can be extended by various plugins for several use cases. Unfortunately, there is no plugin for QUnit test results. So we have to utilize the existing plugin for JUnit. Two steps need to be done:

  • Find a way to run the tests in Jenkins.
  • Find a way to output the QUnit results as JUnit results.

Running QUnit Tests in Jenkins

Jenkins offers no service to open web pages upon deployment. I didn’t know of any plugin that offers such a thing. So I went out googling. I found a guide in the repository of the HTML5 boilerplate on how to set up QUnit with Jenkins that suggested to use PhantomJS with a QUnit test runner.

Following that lead, the first thing I did was to download PhantomJS and try to run my test HTML file locally. PhantomJS can only run JavaScript files, so I needed a test runner. I took a look at the one mentioned above but found it a little too bold for our needs, so I came with my own solution.

var system = require('system');
var fs = require('fs');
var page = require('webpage').create();

// argument 0 is always the file which is called (this)
if (system.args.length === 1) {
    console.log('Pass the path/to/testfile.js as argument to run the test.');
    phantom.exit();
} else {
    // path is relative to where phantomjs is started
    var url = system.args[1]; // e.g. 'test/unit/tests.html'
    console.log("Opening " + url);
}

page.open(url, function (status) {
    console.log("Status: " + status);
    if (status === "success") {
        setTimeout(function () {
            var path = 'results.xml';
            var output = page.evaluate(function () {
                return document.output;
            });

            fs.write(path, output, 'w');
            console.log("Wrote JUnit style output of QUnit tests into " + path);

            console.log("Tests finished. Exiting.");
            phantom.exit();
        }, 3000);
    } else {
        console.log("Failure opening" + url + ". Exiting.");
        phantom.exit();
    }
});

What does the runner? It takes an argument with the path to the HTML QUnit test file that is to be opened by PhantomJS. If the argument is missing, we can use console.log() to print the result into the console running PhantomJS. The main part of the script opens the page. If the given file doesn’t exist an error message is logged and PhantomJS terminated. If the file can be opened the JavaScript variable document.output of the test page is evaluated and written into a file called results.xml. The evaluation is done after a timeout of three seconds – the time the tests never exceeded on my local machine.

Output QUnit Results in JUnit Format

In the next step we need to make sure the QUnit results can be interpreted by the Jenkins JUnit plugin. Luckily, there is already a plugin for QUnit to produce the results in a JUnit-style XML report. I installed the plugin and configured it to write the results in the document.output variable that we’ve already seen in the PhantomJS runner above.

The current setup is running fine on my local machine: PhantomJS is installed, can be started via shell to execute the runner script, opening the QUnit test HTML file and saving the JUnit-style report into results.xml.

Creating the Jenkins Pipeline

Let’s make sure the job is also running in Jenkins. At Akanoo, Jenkins lies inside a docker image, so I edited the Dockerfile to download and unpack PhantomJS. Use the correct version (32bit or 64bit) — I first used 32bit on a 64bit machine and wondered why it didn’t work. Make sure to add PhantomJS to your PATH variable.

Jenkins allows by default to define multiple build steps for one build. But we want to achieve that the full build is terminated as soon as one step fails. Jenkins offers the Pipeline plugin to define multiple stages of a build. So I installed the pipeline and the JUnit plugin and restarted Jenkins.

I have configured several stages in the pipeline:

  1. Checkout the latest version of code from Git.
  2. Run the tests in PhantomJS, archive the test results and report results to the JUnit plugin.
  3. Build, if the previous step didn’t fail.

To make the build fail if an error in the unit tests occured, we can utilize a try-catch-block. The Groovy script in the pipeline also allows to run shell scripts which we need to run PhantomJS. I came up with the following script:

node {
    stage('Version Control') {
        // checkout the latest version from Git
    }
    stage('Test') {
        try {
            // run PhantomJS
            sh 'cd ${JENKINS_HOME}/path/to/unit/tests && phantomjs phantomjs-runner.js tests.html'
 
            // move result file into workspace
            sh 'mv ${JENKINS_HOME}/path/to/unit/tests/results.xml ${JENKINS_HOME}/workspace/${JOB_NAME}'
 
            // archive test results with relative path from ${JENKINS_HOME}/workspace
            step([$class: 'JUnitResultArchiver', testResults: '**results.xml'])
 
            // report to JUnit with relative path from ${JENKINS_HOME}/workspace
            junit '**results.xml'
        } catch(err) {
            throw err
        }
    }
    stage('Build') {
        // I would build now if the test didn't fail
    }
}

Let’s discuss the script line-by-line. At first, we have the Version Control stage. I assume you know how to checkout from Git. You may also omit this stage if the script is stored on the same machine as Jenkins.

In the Test stage a shell script executes PhantomJS with two parameters: the phantomjs-runner.js file we discussed above and the QUnit HTML test file. The results of the test are stored in a file called results.xml in the same folder the tests lie in. In the next line we move it into the Jenkins workspace of the current job. The step command is used to store the test results using the JUnitResultArchiver to be able to analyse the results of all tests later. We also send the results to the JUnit plugin to check for errors. This step will throw an exception if errors are found that is caught by the try-catch-block and re-thrown to stop the build before starting the Build step.

In the Build step the actual build would run. This step depends on what you want to achieve. In our case we run a Groovy script.

Conclusion

We managed to configure a Jenkins build pipeline that checkouts the current version from Git (or any other version control system), runs the QUnit tests in a PhantomJS headless browser, returns the test results in JUnit-style format, archives the results and only builds if the tests were successful.

It took me a couple of hours to figure out the single steps and bring everything together. I hope you found this useful. If you have any questions or ideas for optimization, please leave a comment below.

Unit Testing Self-Invoking JavaScript Functions

When Akanoo started out, we only tested our components run on Java – either written in Scala or using Grails or Groovy – with JUnit, ScalaTest and Spock. By the end of 2015 we also wanted unit tests for our JavaScript tracking library that is integrated in our client’s online shops.

Finding the right test framework

Nobody in our team had experience in testing JavaScript, so I went out looking for JavaScript unit test frameworks. What does one do if one has no clue? Google, of course. The first hit was QUnit developed and used by the jQuery team. I looked into other libraries but decided to give QUnit a shot. The reasons I chose it were:

  • Support by the community (In this case the jQuery community.)
  • Used by some big players (jQuery certainly is a big player.)
  • Easy to use (Check out the cookbook.)
  • Easy to understand for Java-developers (Hence, similarity with JUnit.)
  • Plug-in support & availability of plug-ins (We’ll come to this later.)

So, I had chosen my test framework. Let’s write some tests.

Testing self-invoking JavaScript Functions

Well, testing wasn’t that easy at the beginning. Our tracking script is encapsulated in a self-invoking function also known as immediately-invoked function expression (IIFE). The reason behind is to avoid polluting the global space (everything in the window object) with our functions and variables and only allow access via our API calls using the exposed function at(). See an example of an immediately-invoked function expression below:

As the name says, the function is immediately invoked after definition and I have no chance to call the function itself or to access the variables and functions inside. What could I do to test the function bla() in the above example?
My first idea was to comment out the two lines that define the function as self-invoking and define the parameterized variables by hand:

I am now able to call the function bla() and access the variable foo. So I went over to my test files and wrote some QUnit tests.

Spying with Sinon.js

Once I finished writing some easy test checking for the output of functions with various input parameters. Unfortunately, some functions are called inside other functions and these make for harder testing. If I want to know whether the inner function has been called, I need to spy on its execution.

QUnit itself doesn’t offer functions for spying, so I googled again to find a solution for JavaScript “mocking” [https://en.wikipedia.org/wiki/Mock_object]. I stumbled upon Sinon.js [http://sinonjs.org/] which offers spies, stubs and some more nifty features and integrates nicely with QUnit.

I started to write some tests but the function sinon.spy(object, “function”) requires to specify the encapsulating object of the “function” to spy. After my changes to the self-invoking function expression the functions reside in the “global” scope which in JavaScript means they lie under the window object.

// the following line will be replaced by the code in the comment
var a = window, b = document;//(function(a, b) {
  // our tracking code goes here
  var obj = {
    // variables go here
    foo: "bar";
  }
  
  // functions go like this
  obj.bla = function() {
    // sample function
    console.log(foo);
  }
//})(window, document);

Unfortunately, I wasn’t able to spy functions using Sinon on the window object. So, I went on putting all functions inside my faked IIFE inside an object to make them testable. Of course, I had to refactor the existing code already to some extent, i.e. extracting functions, avoiding anonymous functions etc. I strongly recommend to use JSHint.

Notice: When asserting spies with Sinon’s function withArgs(arg0, arg1, …) be aware that the order of arguments must be the same as in the call. If you want to check for the first two of three arguments, you may omit the third but if you want to check for the latter ones, the first argument needs to be defined.

Stubbing with Sinon.js

Sometimes the result of function A is dependent on the result of function B that is called inside function A. To test, I have to manipulate the inner function B to always return a predefined value. This is called stubbing and can be done with Sinon.js by calling sinon.stub(object, “function”).
You should also check out the faked XHR and faked server that Sinon offers. They were very helpful for my unit tests.

Conclusion

If you’ve read until now you certainly can guess that it was a long way down the road to 95% test coverage (measured with JSCover). The upsides of this solution using QUnit and Sinon.js is apparently that we achieved the possibility of unit testing. The obvious downside is the superfluous object in the IIFE. That is, however, not so bad as the Google closure compiler with hard optimization enabled minifies the code efficiently.

Check out the second part of the JavaScript Testing Series: How to run JavaScript QUnit Tests upon Deployment using Jenkins & PhantomJS

The Pimp my library pattern (PMLP)

Quickly decorate your classes with additional methods and features

Picture source: wikipedia (taken by Marcus Hansson)

When working on projects suddenly the need may arise to add additional methods and/or features to a pre-existing class. Here are two reasons why I do use the pimp my library pattern (PMLP):

  • Testing. You want to test something out without messing around in a library’s code prematurely (especially if the class is part of an external project or library)
  • Context-specificity. if the methods/attributes you are about to implement are context specific and they are unlikely to be necessary outside your piece of work the PMLP can help you to not bloat the contents of the library want to pimp

Pimp My library Pattern: a toy example.

Let’s assume you need a custom print function that we call highlight which we want to use as extension for the scala string.

def highlight(s: String): String = s"<< $s >>"

Now you want this function to be available for any string within you code.

Listing 1:

class GreatWork extends RecentGreatWork {
 def do(s: String): String = {...}
 def doAndHighlight(s: String): String = {
  val greatStuff: String = do(s)
    greatStuff.highlight
  }
 }

To make the highlights function available, you define a new class. We are going to call it StringHelperFunctions.

Listing 2:

class StringHelperFunctions(s: String) { 
  def highlight: String = s"<< $s >>"
}

Furthermore you add an implicit conversion function to your package object which looks like this:

Listing 3:

implicit def withStringHelperFunctions(s: String): StringHelperFunctions = 
  new StringHelperFunctions(s)

Now any function you add to the StringHelperFunctions class will be available to any String within the context of your package and so it will in line 5 of the first Listing 1 of this article.

How we apply the PMLP for data science specific tasks.

I just recently used the PMLP for experimental implementation of predictors.

We at Akanoo predict user behavior. We track click-stream data of customers of our clients’ web shops. This data is made available to our models by classes, one is called Visit. This class represents a single visit on a customer’s web site and contains functions that are used all over the place in our components.

Now, if I try to predict a new kind of behavior, like e. g. a visitor is likely (or unlikely) to visit or re-visit a specific product category I can think of predictors that relate to that behavior.

Quite often, the implementation of these predictors profit from new functions that could reside in the Visit class. But:

  • Testing. The Visit class is part of an external project. For every change to the helper functions there I would have to modify, test, build and package the project in order to use it. If I just use the PMLP I can just concentrate on projects code that do contain my experimental predictors.
  • Context-Specificity. My new functions are only used specifically for the predictors I am coming up with adding these functions to Visit would bloat up its contents unnecessarily and would cause hassle in the long run

I hope that gave you a blink of an idea how you also could use the PMLP

I would be happy to hear from you:

  • Did this article helped you and why (not) ?
  • How do you use the Pimp my Library Pattern?
  • What other patterns do you use an think are valuable

Happy Coding and kind greetings from

Frank
– Senior Data Scientist @ Akanoo