When can we say random forest is overfitting?

Question

I've divided my data into 3 sets: train, validate and test

I've trained my data once on train set and tested it on the test set. After tuning random forest on validate set, I've got the following numbers:

I'm using accuracy as the evaluation metric as all classes are balanced

Train Acc: 96%

Test Acc: 93%

RF oob_score_: 91%

As there's a 2% gap between my test accuracy and out-of-bag score, is the model overfitting or not? What is the minimum gap at which I can say the model is starting to overfit?

The difference is not much. Assume you have done eveything good, it's not a sign for overfitting. — SmallChess, Commented Apr 28, 2018 at 7:03
@SmallChess what according to you would be a sign of overfitting? — Jaskaran Singh Puri, Commented Apr 28, 2018 at 7:37

cq64 · Accepted Answer · 2018-04-28 10:19:32Z

A sign of overfitting is when your model performs really well on the training set, achieving high accuracy, but performs badly on an independent test set. Under the assumption that you've properly split your training, validation, and test sets, and there's a sufficiently large number of samples to justify not using cross validation, then it is highly unlikely that your model is overfitting, as it still achieves relatively high accuracy on the test set.

For more details about overfitting, you can check https://en.wikipedia.org/wiki/Overfitting

Stack Exchange Network

When can we say random forest is overfitting?

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
machine-learning
random-forest
overfitting
ensemble-learning
or ask your own question.

Hot Network Questions

When can we say random forest is overfitting?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged machine-learningrandom-forestoverfittingensemble-learning or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
machine-learning
random-forest
overfitting
ensemble-learning
or ask your own question.