Uplift Modeling
Often times an important question in direct marketing is which customers to target. This is because some customers will not react to the campaign, while still incurring the variable costs of marketing. Worse, some customers might react negatively to receiving an ad. Generally, there are four types of customers in direct marketing:
- Customers who will respond without treatment,
- customers who will only respond after receiving a treatment,
- customers who will not respond because of a treatment, and
- customers who will not respond regardless.
The goal of uplift modeling is to differentiate customers in group 2 from those in the other groups.
Start with A/B-testing
To build an uplift model one needs the result from an A/B-testing experiment. Using two groups and a binary outcome will give four possible combinations as shown below.
Figure 1: Four possibilities in an A/B-test experiment. (Shaar2016)
Given the result one can calculate the uplift as
P(return|treatment)-P(return|control).
Unfortunately, this term is not defined for a single customer, so we cannot simply build a model that maximizes it. Instead there are serveral approaches to maximize it indirectly.
Building a model
The simplest one neglects the control group and models the probability that a customer will return given that she received treatment and given some customer specific information.
P(return|treatment,information)
Throwing away half the data isn’t optimal. For instance this model doesn’t predict which customers will only return when given some treatment, as we required above. Instead it will give all customers that will return regardless of treatment.
A simple extension is to build two models, one for each term in the uplift equation. Formally,
P(return|treatment,information)-P(return|control,information).
While this approach directly models the uplift effect, research suggests that it doesn’t always perform well in practice. One reason for that is that the treatment effect is usually much smaller than the main effect, e.g. if the main effect is 1.0% in the control group and the combined effect is 1.1% in the treatment group, then the uplift effect is only 0.1% (=1.1%-1.0%). Consequently the individual model will mostly focus on the main effect within each group. (Radcliffe, 2011)
An easy way to a single model approach is to apply a class variable transformation to the result from the A/B-test. Both the TR and the CNR groups will be assigned to the positive group (e.g. treatment) and TNR and CR will be assigned to the negative group (no treatment).
Table 1: Example of the class variable transformation.
Customer |
Group 1=Treatment 0=Control |
Returned 1=Returned 0=Not Returned |
Transformed |
1 | 0 | 1 | 0 |
2 | 1 | 0 | 0 |
3 | 0 | 0 | 1 |
4 | 1 | 0 | 0 |
5 | 1 | 1 | 1 |
6 | 0 | 0 | 1 |
The idea is that from the four fields, we would definitely like to treat the customers in the group with the TR outcome. And because we don’t know whether the customers in the CNR would have responded if they had received a treatment, we would also like to treat them. It can be shown that under some assumptions modeling the conditional probability of the transformed class variable is equivalent to modeling the conditional probabilities of the two original variables from uplift equation above.
There are more elaborate techniques that incorporate the maximization of the uplift equation into the training algorithm. However, the straightforward implementation and applicability to any standard machine-learning algorithm make the two models approach and the class variable transformation approach very attractive first choices.