Skip to main content

All Questions

Filter by
Sorted by
Tagged with
2 votes
3 answers
322 views

What is "information leak from test to train" ? Is stratification by target a leak?

It's common practice to do procedures such as standardization and even missing value imputation (commonly based on some means) after train/test split - otherwise it is treated as information leak from ...
Ars ML's user avatar
  • 31
1 vote
0 answers
585 views

Normalization and RidgeCV in Sklearn Pipeline - possible data leakage?

To avoid data leakage between the train and test set, I'm using sklearn's Pipeline as follows: ...
flanders's user avatar
2 votes
1 answer
1k views

What is the difference between standardizing time series data and non-time series data?

From reading some answers on this site (1, 2, 3 and 4) I found that, on time series data, standardization must be applied separately on the train and test sets to avoid data leakage. So the train data ...
Marcus's user avatar
  • 265
1 vote
1 answer
347 views

Normalization of training and test set with data leakage

I have a time series data set for actual number of airport passengers. Within 15 years (2004 ~ 2019), just like having a trend, number of the passengers is increasing over time as the country is ...
S. Jay's user avatar
  • 35
1 vote
1 answer
236 views

Standardize data before plotting learning curve?

I have implemented cross validation function with hyper parameter tuning. Basically, doing the following: Split the data into 80% training, 20% testing apply cross validation with hyper parameter ...
Perl Del Rey's user avatar