1
$\begingroup$

I've divided my data into 3 sets: train, validate and test

I've trained my data once on train set and tested it on the test set. After tuning random forest on validate set, I've got the following numbers:

I'm using accuracy as the evaluation metric as all classes are balanced

Train Acc: 96%

Test Acc: 93%

RF oob_score_: 91%

As there's a 2% gap between my test accuracy and out-of-bag score, is the model overfitting or not? What is the minimum gap at which I can say the model is starting to overfit?

$\endgroup$
2
  • $\begingroup$ The difference is not much. Assume you have done eveything good, it's not a sign for overfitting. $\endgroup$
    – SmallChess
    Commented Apr 28, 2018 at 7:03
  • 1
    $\begingroup$ @SmallChess what according to you would be a sign of overfitting? $\endgroup$ Commented Apr 28, 2018 at 7:37

1 Answer 1

1
$\begingroup$

A sign of overfitting is when your model performs really well on the training set, achieving high accuracy, but performs badly on an independent test set. Under the assumption that you've properly split your training, validation, and test sets, and there's a sufficiently large number of samples to justify not using cross validation, then it is highly unlikely that your model is overfitting, as it still achieves relatively high accuracy on the test set.

For more details about overfitting, you can check https://en.wikipedia.org/wiki/Overfitting

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.