Skip to main content

All Questions

Filter by
Sorted by
Tagged with
1 vote
2 answers
1k views

High Precision and High Recall issue- Random Forest Classification

I am building a classification model using Random Forest technique using GridSearchCV. The target variable is binary where 1 is 7.5% of total population. I have used several values of GridSearch ...
totalsurfer_v1's user avatar
0 votes
0 answers
65 views

Low model performance for an imbalanced data, is there any hope to improve the metrics?

I am working with an imbalanced data: 70k:0 and 1K:1 with 12 features. I would like to perform classification to choose the important features. So far, I have done under-sampling, over-sampling, ...
ricecooker's user avatar
1 vote
1 answer
861 views

k-fold cross validation much better than unseen data

this is my first "real" project and I am not understanding a certain behaviour. My dataset spans from 2017 up to today. What I did is cleaning data, getting rid of missing values etc. There are mixed ...
tuxmania's user avatar
  • 121
3 votes
1 answer
1k views

Why does a class weight fraction improve precision compared to undersampling approach where precision drops?

I have an imbalanced data where the ratio between positive to negative samples is 1:3 (positive samples are 3 times higher than negative). For my case it is is important to have a higher precision (...
David's user avatar
  • 31
1 vote
1 answer
1k views

Is it possible to have recall and precision of 0 while having an area under PR ~0.5?

As the title suggests, I am running a Random Forest classifier using Scala. To evaluate this classifier (and since I am handling highly imbalanced classes), I used the ...
Toutsos's user avatar
  • 157
1 vote
0 answers
104 views

Poor P-R curve for binary classifier trained on balanced data, with imbalanced test data

I have a very imbalanced dataset (9:1), for which I have performed under-sampling and achieved a balanced training set (~130k samples total post balancing). I am performing classification using ...
Anakimi's user avatar
  • 11
0 votes
0 answers
98 views

Average Precision or FBeta & Decision Threshold Tuning for Binary Classifier [duplicate]

I'm working with an imbalanced binary classifier data set (3% positive) in sklearn. The cost of a false negative is extremely high so recall is much more important than precision. To baseline my ...
Nahyyz's user avatar
  • 53
1 vote
2 answers
294 views

Classification: Random Forest vs. Decision tree

Suppose you are given a dataset with 4 attributes (F1, F2, F3, and F4). The class label is contained in attribute F4. Now you build a random forest classification model and you test its performance ...
kalyani Bethi's user avatar
0 votes
0 answers
106 views

Precision is going down as the testing sample increases why?

I trained the random forest machine learning model with 100K records (overall 700k records) with 297 features with 100 trees ...
vinaykva's user avatar
  • 409
1 vote
0 answers
2k views

ROC curves from cross-validation are identical/overlaid and AUC is the same for each fold

UPDATE Confidence Intervals I have an imbalanced dataset with around 200k instances and 50 predictors. The imbalance has a 4:1 ratio for the negative class (i.e class 0). In other words the negative ...
RMS's user avatar
  • 231
2 votes
0 answers
4k views

What is the difference between oob (out of bag) error and (1 - accuracy) in RandomForest?

In a Random Forest, I know that the Out Of Bag Error is described as the fraction of number incorrect classifications over number of out of bag samples. Accuracy is defined as the number of correct ...
makansij's user avatar
  • 2,309
14 votes
1 answer
21k views

How to reduce number of false positives?

I'm trying to solve task called pedestrian detection and I train binary clasifer on two categories positives - people, negatives - background. I have dataset: number of positives= 3752 number of ...
mrgloom's user avatar
  • 2,227
10 votes
1 answer
13k views

classification threshold in RandomForest-sklearn

1) How can I change classification threshold (i think it is 0.5 by default) in RandomForest in sklearn? 2) how can I under-sample in sklearn? 3) I have the following result from RandomForest ...
Big Data Lover's user avatar