All Questions
Tagged with precision-recall random-forest
13 questions
1
vote
2
answers
1k
views
High Precision and High Recall issue- Random Forest Classification
I am building a classification model using Random Forest technique using GridSearchCV. The target variable is binary where 1 is 7.5% of total population. I have used several values of GridSearch ...
0
votes
0
answers
65
views
Low model performance for an imbalanced data, is there any hope to improve the metrics?
I am working with an imbalanced data: 70k:0 and 1K:1 with 12 features. I would like to perform classification to choose the important features. So far, I have done under-sampling, over-sampling, ...
1
vote
1
answer
861
views
k-fold cross validation much better than unseen data
this is my first "real" project and I am not understanding a certain behaviour.
My dataset spans from 2017 up to today. What I did is cleaning data, getting rid of missing values etc.
There are mixed ...
3
votes
1
answer
1k
views
Why does a class weight fraction improve precision compared to undersampling approach where precision drops?
I have an imbalanced data where the ratio between positive to negative samples is 1:3 (positive samples are 3 times higher than negative). For my case it is is important to have a higher precision (...
1
vote
1
answer
1k
views
Is it possible to have recall and precision of 0 while having an area under PR ~0.5?
As the title suggests, I am running a Random Forest classifier using Scala. To evaluate this classifier (and since I am handling highly imbalanced classes), I used the ...
1
vote
0
answers
104
views
Poor P-R curve for binary classifier trained on balanced data, with imbalanced test data
I have a very imbalanced dataset (9:1), for which I have performed under-sampling and achieved a balanced training set (~130k samples total post balancing).
I am performing classification using ...
0
votes
0
answers
98
views
Average Precision or FBeta & Decision Threshold Tuning for Binary Classifier [duplicate]
I'm working with an imbalanced binary classifier data set (3% positive) in sklearn. The cost of a false negative is extremely high so recall is much more important than precision.
To baseline my ...
1
vote
2
answers
294
views
Classification: Random Forest vs. Decision tree
Suppose you are given a dataset with 4 attributes (F1, F2, F3, and F4). The class label is contained in attribute F4.
Now you build a random forest classification model and you test its performance ...
0
votes
0
answers
106
views
Precision is going down as the testing sample increases why?
I trained the random forest machine learning model with 100K records (overall 700k records) with 297 features with 100 trees
...
1
vote
0
answers
2k
views
ROC curves from cross-validation are identical/overlaid and AUC is the same for each fold
UPDATE Confidence Intervals
I have an imbalanced dataset with around 200k instances and 50 predictors. The imbalance has a 4:1 ratio for the negative class (i.e class 0). In other words the negative ...
2
votes
0
answers
4k
views
What is the difference between oob (out of bag) error and (1 - accuracy) in RandomForest?
In a Random Forest, I know that the Out Of Bag Error is described as the fraction of number incorrect classifications over number of out of bag samples.
Accuracy is defined as the number of correct ...
14
votes
1
answer
21k
views
How to reduce number of false positives?
I'm trying to solve task called pedestrian detection and I train binary clasifer on two categories positives - people, negatives - background.
I have dataset:
number of positives= 3752
number of ...
10
votes
1
answer
13k
views
classification threshold in RandomForest-sklearn
1) How can I change classification threshold (i think it is 0.5 by default) in RandomForest in sklearn?
2) how can I under-sample in sklearn?
3) I have the following result from RandomForest ...