I have been trying to build a uplift model which gives incremental probability of a customer responding to a treatment. I am thinking of using pylift library for my model, I had few questions regarding the same 1. Does the library provide any function to return the incremental probabilities? 2. I don't have experimental data, I am trying to build the model on call center data, hence I have defined the test group as the customers who received a call from call center for a particular time interval, and the customers who were not called is my control group for the same time period. Does this approach make sense? 3. Is there any other library to build such model?
1 Answer
Does the library provide any function to return the incremental probabilities?
If you mean the predicted incremental values, then yes.
The predicted lift values are contained in the class object: up.transformed_y_test_pred
for the test set, and up.transformed_y_train_pred
for the training set (this name really should be changed to something more intuitive).
I don't have experimental data, I am trying to build the model on call center data, hence I have defined the test group as the customers who received a call from call center for a particular time interval, and the customers who were not called is my control group for the same time period. Does this approach make sense?
This depends, but it's generally pretty dangerous to use observational data. You could be introducing heavy bias into your data -- e.g. imagine if happy people are more likely to pick up when you call than sad people. You look at the different conversion rates between happy and sad people, and you find that it's very high. But you find then when you run a randomized experiment, the lift disappears. In this contrived example, happy people are simply more likely to convert than sad people, and this is the effect that you've discovered. In the jargon, the problem here is that unconfoundedness is not satisfied.
However, if you could somehow remove this bias, by say, predicting $P(treatment|X)$ perfectly, then you could use this propensity score to reweight all your observations. Once you make this model, you can add it to your dataframe, then add the col_policy
kwarg to your initiation of TransformedOutcome
in pylift
. Everything should subsequently do this reweighting for you automatically and correctly.
But I'd still say that this is dangerous...
Is there any other library to build such model?
Absolutely. There are a couple of other notable ones, but they are both in R.
- Susan Athey's causalTree package
- Leo Guelman's uplift R package (
library(uplift)
)
-
$\begingroup$ How does this answer the question? $\endgroup$ Commented May 9, 2019 at 3:21
-
$\begingroup$ Hey Robert, I have not used the pylift library, I have built a model with an intervention flag in my data which helps segregate the test and control population and then calculated incremental probability. However, I am facing the exact same issue that you mentioned with using observational data. How can I identify and remove the biases caused due to using observational data. Any input will be helpful. $\endgroup$ Commented May 9, 2019 at 9:34
-
$\begingroup$ Update: Athey, Tibshirani and Wager have a new R package {grf} that implements
causal_forest
as a more efficient (and theoretically sound) version of the Guelman's uplift forest. $\endgroup$– JohannesCommented Oct 3, 2019 at 19:48