Handling look-ahead bias while training

Ask Question

Asked 4 years, 10 months ago

Modified 4 years, 10 months ago

Viewed 134 times

A lot of real-life transactional/CRM applications are not careful in creating snapshots of the data (historical data) for the model to learn from.

Imagine a typical example of predicting the ACV growth of a new account.

A potential variable of interest is the employee count of the account and measuring to see if a company's employee growth is predictive of ACV growth.

The tricky part is a lot of times, such variables are available only real-time and they have not been snapshot (sometimes the records get overwritten and we lose the historic data)

Although this variable is available during prediction, we will not be able to use this variable when constructing a training dataset based on historical data - in such circumstances, we might be tempted to overlook the look-ahead bias and use such variables in training as well.

In the past, I have omitted such variables and managed to build models without including them. This can lead to a bit of confusion for the clients/stakeholders involved as they assume that such a simple variable must be available for the model to use it.

My question here is :

What are the possible approaches one can experiment with, to still use this variable for training when it is not snapshot (apart from omitting them)? Proxy variables can be a method but then again, they require being snapshot and in most cases, it's either all such variables are snapshot or none. I was not able to find much research on this - maybe I was not looking for the right topics.
What are the potential side-effects of including this variable during training (basically ignoring the bias) - specifically, is there any way to separate out and estimate the error due to this bias alone? Personally I would never want to include it but if it is more of a tradeoff, then I would want to estimate the expected error.
Are there any non-supervised learning methods which doesn't require the presence of training data useful for this? (Unsupervised like Clustering perhaps - any research about this?)

asked Jan 24, 2020 at 22:18

keshr3106

451 silver badge8 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Stack Exchange Network

Handling look-ahead bias while training

0

Your Answer

Browse other questions tagged
bias
time-varying-covariate
training-error
or ask your own question.

Hot Network Questions

Handling look-ahead bias while training

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Browse other questions tagged biastime-varying-covariatetraining-error or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
bias
time-varying-covariate
training-error
or ask your own question.