All Questions
Tagged with precision-recall classification
152 questions
1
vote
0
answers
38
views
How to estimate precision and recall without taking a huge random sample when the positive class is relatively rare
I have a binary text classification model, and I would like to test how well it works, in terms of precision and recall, on a new dataset of 2 million text documents that have not been annotated yet. ...
2
votes
1
answer
130
views
We have sensitivity-specificity space (ROC curves) and precision-recall space (PR curves, $F_1$ score). What work has been done with PPV-NPV space?
Receiver-operator characteristic (ROC) curves display the balance between sensitivity and specificity: how good you are at detecting category $1$ (sensitivity) while not falsely identifying category $...
2
votes
0
answers
29
views
Re-calculate accuracy, precision and recall after treatment effect in a model
Working in a churn-prediction model where the goal is to detect the players that have a high chance to churn from the site and send those players an offer to keep them in site.
In the initial training ...
1
vote
1
answer
127
views
Choosing the correct evaluation metric between F1-score and Area under the Precision-Recall Curve (AUPRC)
We're currently working on detecting specific objects (e.g. poultry farms, hospitals) from satellite images. We've modeled the problem as a binary image classification task (i.e. classifying images ...
2
votes
1
answer
225
views
How to define Precision when we have multiple predictions for each ground truth instance?
In my problem, it is possible to have multiple predictions for a ground truth instance. How we define precision in such scenarios?
For further clarification consider the following example. We have 1 ...
0
votes
0
answers
28
views
Why would area under the PR curve include points off of the Pareto front?
(Let's set aside thoughts about if we should be calculating PR curves or areas under them at all.)
A precision-recall curve for a "classification" model can contain points that should not be ...
0
votes
0
answers
29
views
How to calculate AUC for a P-R curve with unusual starting point
I am working with a binary classifier that is outputting scores between 0 and 1, indicating probabilities of class membership, according to the model. I produced a P-R curve and the first point (i.e., ...
1
vote
1
answer
152
views
Why does my PR Curve look like this?
These are my recall and precision stats for the model I built. The Curve does not look good where recall is 0. Not sure why there are so many points there. Can anyone help and explain why the curve ...
5
votes
2
answers
713
views
Understanding Precision Recall in business context
So, I know yet another Precision, Recall question which is asked umpteenth times now.
I wanted to ask some specific business related questions.
Imagine if you are building a classifier to predict ...
3
votes
1
answer
47
views
Precision and recall reported in classification model
I have one question about the evaluation metrics of classification models. I see many people report the precision and recall value for their classification models. Do they choose a threshold to ...
3
votes
1
answer
696
views
ROC AUC has $0.5$ as random performance. Does PR AUC have a similar notion?
In considering ROC AUC, there is a sense in which $0.5$ is the performance of a random model. Conveniently, this is true, no matter the data or the prior probability of class membership; the ROC AUC ...
11
votes
8
answers
9k
views
My machine learning model has precision of 30%. Can this model be useful?
I've encountered an interesting discussion at work on interpretation of precision (confusion matrix) within a machine learning model. The interpretation of precision is where there is a difference of ...
0
votes
0
answers
19
views
Flipping inputs in multilabel classification
I have framed a classification problem as follows:
I have $N$ items, and wish to predict a set of relevant tags for each out of $M$ tags. An item can have anywhere from 0 to $M$ applicable tags.
To ...
1
vote
0
answers
80
views
A better linear model has less precision(relative to the worse model) at a larger threshold
I trained two models using the same algorithm - logistic regression (LogisticRegression(max_iter=180, C=1.05) for ~27 features and ~330K observations). I used the ...
1
vote
0
answers
107
views
Accounting for overrepresentation of positives in binary classification test set for calculation of precision and recall
I have a binary classification task with highly imbalanced data, since the class to be detected (in the following referred to as the positives) is very rare.
For data limitation reasons my test set ...
3
votes
1
answer
225
views
Singular beta in the F-beta vs. threshold score?
Consider this plot of the $F_\beta$ score for different values of $\beta$. I have a hard time getting an intuition as to why they intersect at a same point. (Cf. this blog post.) In other words, why ...
2
votes
2
answers
1k
views
How to get the threshold from PrecisionRecallDisplay?
My goal is to tune the Classifier with probability predict_proba() < threshold. Therefore, I need to get the threshold.
The problem is ...
3
votes
3
answers
194
views
Isn't partial AUC a better metric than AUC for cost-sensitive classification problems?
In many classification problems, the cost of a FP is different from the cost of a FN. In spam detection, a FP (a regular email classified as spam) should have a high cost. In cancer prediction, a FN (...
0
votes
1
answer
421
views
How can area under ROC (AUC) be bad when precision, recall, and accuracy are all good?
I have a model with the following scores:
Precision: 0.703588
Recall: 0.976526
Accuracy: 0.694936
I thought this was fairly decent, especially considering that my (binary) response class is 1/3 of the ...
1
vote
1
answer
132
views
Reporting performance measures for classification in percentage or fraction?
I have seen classification metrics like f1-score, precision and recall being reported both as fractions and percentages. These measures are between 0 and ...
4
votes
1
answer
562
views
Measures to compare classification partitions
What are the most used measures (coefficients) to compare two partitions of objects into classes? I am speaking of validating the results of classification, not of clustering; the measures known as ...
0
votes
1
answer
667
views
F2 score or the Area under the Precision-Recall-Curve as a scoring metric
I have a dataset with which I want to perform binary classification.
The distribution of the target class is imbalanced: 20% positive labels, 80% negative labels.
The positive class is more important ...
2
votes
1
answer
62
views
A metric for a big/medium/small ML classification
I am working on an ML classification task which is similar to the following:
Apples have to be classified to three classes: Big, Medium and Small.
I need a metric which I can use to assess the system. ...
0
votes
0
answers
88
views
Interpret precision and recall curve
I am evaluating a classification model and I am using the PR curve because of my highly imbalanced dataset (negative class is the 5%).
In the end I'm comparing the training PR curve and the test PR ...
2
votes
1
answer
215
views
Match number of positives in unbalanced data set
I am dealing with a very unbalanced binary classification problem: 1% positives, 99% negatives. Training set is around 10 million rows, 40 columns. I choose the decision threshold (cutoff) on the ...
3
votes
1
answer
178
views
When comparing classifiers on different datasets with different prevalences, is it valid to calculate prevalence-adjusted PPV?
Scenario: comparison of 2 different binary classifiers
Both classifiers report sensitivity and specificity and number actually positive (P), but classifier 1 is tested on a dataset with prevalence 20%,...
2
votes
0
answers
467
views
Why does precision_recall_curve() return similar but not equal values than confusion matrix?
INTRO: I wrote a very simple machine learning project which classifies numbers based on the minst dataset:
...
0
votes
1
answer
474
views
How to practically calculate the accuracy of each class in muliclass classification problem?
I have the following confusion matrix:
...
1
vote
1
answer
918
views
Interpreting NaN values for precision in Confusion Matrix
Please refer to the confusion matrix here: https://i.sstatic.net/Yxh5V.jpg
Would I get precision values of NaN because of 0/0 in the right most columns? Is that even possible? How should I interpret ...
2
votes
1
answer
1k
views
How to calculate F1, Precision, and Recall for Multi-Label Multi-Classification
I have a predictive model as follows
Sample1
Sample2
Sample3
Sample4
Red
Yellow
Blue
Green
White
Black
Orange
65
21
55
40
0
0
1
0
1
0
0
31
40
44
30
0
0
0
0
0
0
0
33
44
56
66
1
0
0
1
0
0
1
63
77
...
0
votes
0
answers
23
views
Custom metric or already exist an metric to this problem? Accuracy by ID
at my company we are working with the "recall" metric in a specific problem, however this metric do not reflect the results that we would like to achieve.
Take a look at below table.
We ...
1
vote
0
answers
2k
views
Precision-Recall curves with multiclass classifier
I would like to plot the PR curves for a multiclass classifier (e.g. 3 classes). In the documentation it states that multiclass is not supported, and instead a series of one vs all classifiers are ...
1
vote
2
answers
1k
views
High Precision and High Recall issue- Random Forest Classification
I am building a classification model using Random Forest technique using GridSearchCV. The target variable is binary where 1 is 7.5% of total population. I have used several values of GridSearch ...
1
vote
1
answer
364
views
Why is it called Sensitivity/Recall and Specificity?
Where do the terms: Sensitivity, Recall and Specificity come from historically? I've been looking for an answer for quite some time but to no avail.
I understand the formulae and what they mean but I ...
0
votes
1
answer
202
views
can reversing a very inaccurate binary classifier give more accurate predictions?
I was wondering if somehow I get a model for binary classification with let us say a 0.30 accuracy, does this means that if I reverse the outcome of the model (ie. swapping 0s and 1s in the ...
4
votes
1
answer
439
views
Measuring Precision/Recall on a biased sample
I am working with ML models that predict e.g. whether an email violates some corporate policy or not. In this case, the "positives" are emails that violate the policy, and the number of ...
1
vote
0
answers
38
views
Help with understanding metrics for imbalanced classification
I am trying to train a neural network to classify chest X-ray scans as my final MSc project. I have a dataset of 13808 image, 3616 labelled COVID, 10192 labelled normal, so the ratio of COVID to ...
4
votes
1
answer
317
views
Why use harmonic mean for precision and recall (f1 score) instead of just the product of precision and recall?
General question here, I understand the purpose of using the harmonic mean to generate the f1 score for model evaluation. I'm not exactly sure though why we don't just take the product of precision ...
3
votes
1
answer
5k
views
On which set (train/val/test) do people calculate F1 score, precision and recall?
This may be a stupid question, but when I was looking at the definition of precision/recall etc. it was not mentioned anywhere which set (training/validation/test) this metric should be calculated ...
0
votes
1
answer
703
views
Multiclassification: precision-recall from scratch vs sklearn
I would like to know if there´s any issue behind using sklearn's precision/recall metric functions and coding up from scratch in ...
4
votes
1
answer
2k
views
Accuracy always equal to recall
Fitting 3 different models on a 5-class imbalanced dataset. The results show model accuracy always being equal to the recall. How can this be possible?
...
4
votes
1
answer
3k
views
Dip in Precision / Recall Curve
I have the following precision / recall curve. I am not sure why there would be a dip and then growth of precision, I would expect it to step down as the classifier was loosened.
If anyone could shed ...
1
vote
0
answers
39
views
Incorporating per-class-accuracy penalties into Deep Learning
Typically when training Neural Networks for image classification, many people use SGD with weight-decay as a penalty term. The loss that is minimized corresponds to the misclassification and state of ...
0
votes
0
answers
65
views
Low model performance for an imbalanced data, is there any hope to improve the metrics?
I am working with an imbalanced data: 70k:0 and 1K:1 with 12 features. I would like to perform classification to choose the important features. So far, I have done under-sampling, over-sampling, ...
2
votes
1
answer
2k
views
Is it possible to get low AUC score but high Precision and Recall?
I am doing classification on a fairly imbalanced dataset (about 1:2 ratio). I have so far so far tried lasso and logistic regression. I didn't downsample the dataset because the sample size is low (...
1
vote
0
answers
60
views
How to interpret the Precision and Recall curve in-sample vs out-of-sample
I have an imbalanced binary classification problem.
After all the preprocessing (scaling, feature selection), I am going through an hyperparameter optmimisation using GridSearchCV to find the best ...
0
votes
3
answers
1k
views
Is it ok a threshold of 0?
I am dealing with a classification problem with a dataset containing 60k rows: 69k are negative class, and 1k is positive.
I trained my models and I obtained the confusion matrices with a threshold of ...
2
votes
2
answers
2k
views
Weighting common performance metrics by classification outcomes?
Cost-sensitive classification metrics are somewhat common (whereby correctly predicted items are weighted to 0 and misclassified outcomes are weighted according to their specific cost). Some examples ...
1
vote
0
answers
280
views
Precision-Recall Curve and Area under Precision-Recall Curve (AUC)
I created model (logistic regression) and now trying to create Precision-Recall plot and calculate area under Precision-Recall Plot. I'd like to note that this model is defective:
...
7
votes
4
answers
1k
views
Why we use precision/recall in binary classification but sensitivity(=recall)/specificity in medicine?
Sensitivity=recall is used in both fields, but the second metric is different. Why? Both tasks (classification and medicine) look same - data has two classes and we do some predictions on it and want ...