Propensity to pay modelling

Question

I am trying to build a propensity to pay model given an intervention to a customer.

Context:

The population I am dealing with are customers who were supposed to pay some amount on a certain date but have not paid.
such customers are contacted via Call centres to remind them of the payment to be made
some customers pay, some don't Problem statement: Build a propensity to pay scores for these customers. My current approach:
data: calls made via call centre on a certain month
if a customer has made a payment within 6 days of intervention, tag them as 1, else 0
considered few demographic features as well as few operational metrics those may be correlated to a customer Making a payment
build a classification model (maybe logistic regression) to get the propensity scores.

Questions:

does the approach mentioned make sense
what is the need of propensity scores matching
the data is not experimental, its observational, can I use the target variable with tag 1, mentioned earlier as a test group and tag 0 as the control group.

Any input on this will be very helpful. Thanks in advance!

Logistic regression is not a classification method. It is a direct probability estimation method. See htttp://fharrell.com/post/classification. And since you know the day of payment, use it in the analysis, not some arbitrary dichotomization of time. A Cox proportional hazards model, censoring times on those not yet paid as of the last day known not to have paid, would work. — Frank Harrell, Commented Dec 10, 2018 at 13:12

usgroup · Accepted Answer · 2018-12-10 14:54:23Z

0

"Customer paid within X days" is presumably your binary response variable; call it R.

However, what you want is a "propensity to pay score"; call it S; which corresponds to the probability that R = 1.

So one formulation is, for every customer C, S = P(R=1|C). I.e. your propensity score is just the probability that the customer pays within X days. The problem then becomes, how to estimate P(R=1|C).

There are lots of confounding factors. Are the relationships between your factors of interest and your response variable linear? Is willingness to pay between customers correlated? Etc.

Since you have a binary response variable and probably mixed numeric/categoric data, I'd suggest starting out with XGBoost. It's easy to use, well documented and state of the art. You could also try GLM (e.g. logit or probit regression) but it'll almost always perform worse.

answered Dec 10, 2018 at 14:54

usgroup

1611 silver badge4 bronze badges

$\begingroup$ Thanks for the input, this is very helpful. Apart from that, if you could also help me with choosing the value of X days, how can I choose the value of X, as there are customers who has paid 45 days after the intervention has happened. $\endgroup$
– Arindam Bose
Commented Dec 11, 2018 at 3:15
$\begingroup$ I suggest picking X such that 90% of customers who pay within 1 year do so within X days. $\endgroup$
– usgroup
Commented Dec 11, 2018 at 7:47
$\begingroup$ As I have taken a cross sectional data for building the model, I checked the distribution of the days to payment after call for those who has made a payment, as it was right skewed distribution, I used bootstrapping to calculate the 97% confidence interval and took 6 days as after the bootstrapping the data was having a mean of 6. Does that make sense? $\endgroup$
– Arindam Bose
Commented Dec 11, 2018 at 8:22
$\begingroup$ Not really. The mean won't help you. You want the number of days by which time most people would have paid. I.e. X such that 90% of those who pay do so in X days or less. $\endgroup$
– usgroup
Commented Dec 11, 2018 at 11:46
$\begingroup$ Got it. Another thing, I was doing a little bit of research on propensity modelling, I found that the propensity scores are generally generated on randomized experimental data. And in case of observational data, which generally is not randomized, we get propensity scores by fitting models and then we match data to make a statement on the causal effect. I didn't understand the concept of matching completely in my context. It would be really helpful if you can put some light on this. $\endgroup$
– Arindam Bose
Commented Dec 11, 2018 at 13:33

Add a comment |

Stack Exchange Network

Propensity to pay modelling

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
logistic
python
computational-statistics
propensity-scores
or ask your own question.

Hot Network Questions

Propensity to pay modelling

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged logisticpythoncomputational-statisticspropensity-scores or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
logistic
python
computational-statistics
propensity-scores
or ask your own question.