All Questions
Tagged with data-leakage standardization
5 questions
2
votes
3
answers
322
views
What is "information leak from test to train" ? Is stratification by target a leak?
It's common practice to do procedures such as standardization and even missing value imputation (commonly based on some means) after train/test split - otherwise it is treated as information leak from ...
1
vote
0
answers
585
views
Normalization and RidgeCV in Sklearn Pipeline - possible data leakage?
To avoid data leakage between the train and test set, I'm using sklearn's Pipeline as follows:
...
2
votes
1
answer
1k
views
What is the difference between standardizing time series data and non-time series data?
From reading some answers on this site (1, 2, 3 and 4) I found that, on time series data, standardization must be applied separately on the train and test sets to avoid data leakage.
So the train data ...
1
vote
1
answer
347
views
Normalization of training and test set with data leakage
I have a time series data set for actual number of airport passengers. Within 15 years (2004 ~ 2019), just like having a trend, number of the passengers is increasing over time as the country is ...
1
vote
1
answer
236
views
Standardize data before plotting learning curve?
I have implemented cross validation function with hyper parameter tuning. Basically, doing the following:
Split the data into 80% training, 20% testing
apply cross validation with hyper parameter ...