Questions tagged [precision-recall]
P&R are a way to measure the relevance of set of retrieved instances. Precision is the % of correct instances out of all instances retrieved. Relevance is the % of true instances retrieved. The harmonic mean of P&R is the F1-score. P&R are used in data mining to evaluate classifiers.
498 questions
0
votes
0
answers
7
views
Drawbacks of stratified test set bootstrapping for metric UQ
I am using test-set (percentile) bootstrapping to quantify the uncertainty of various model performance metrics, such as AUROC, AUPR, etc.
To avoid any confusion, the approach is simply:
bootstrap ...
0
votes
1
answer
35
views
Time-dependent area under the precision recall
How to compare the time-dependent precision recall (PR) receiver operating curve (ROC) values for two cox regression models at multiple time points?
To compare two time-dependent AUC values, I would ...
0
votes
0
answers
19
views
Recursive Random Search and Categorical Cost Functions
I'm currently working on a project that involves optimizing the default Spark-submit configurations to minimize execution time. I've developed two models to aid in this process:
Binary Classification ...
0
votes
0
answers
56
views
Average precision vs Average recall in object detection
There are two popular metrics for object detection: Average precision and Average recall. Do you can explain with examples, what are the cases to use AP, and what are the cases to use AR?
I agree that ...
5
votes
3
answers
566
views
Judging a model through the TP, TN, FP, and FN values
I am evaluating a model that predicts the existence or not existence of a "characteristic" (for example, "there is a dog in this image") using several datasets. The system outputs ...
1
vote
0
answers
38
views
How to estimate precision and recall without taking a huge random sample when the positive class is relatively rare
I have a binary text classification model, and I would like to test how well it works, in terms of precision and recall, on a new dataset of 2 million text documents that have not been annotated yet. ...
2
votes
1
answer
130
views
We have sensitivity-specificity space (ROC curves) and precision-recall space (PR curves, $F_1$ score). What work has been done with PPV-NPV space?
Receiver-operator characteristic (ROC) curves display the balance between sensitivity and specificity: how good you are at detecting category $1$ (sensitivity) while not falsely identifying category $...
2
votes
0
answers
29
views
Re-calculate accuracy, precision and recall after treatment effect in a model
Working in a churn-prediction model where the goal is to detect the players that have a high chance to churn from the site and send those players an offer to keep them in site.
In the initial training ...
2
votes
3
answers
234
views
Is relying on just the confusion matrix for highly imbalanced test sets to evaluate model performance a bad idea?
I have a binary classification model with a test set that is highly skewed, the majority class 0 is 22 times greater than the minority class 1.
This causes my Precision to be low and Recall to be high,...
1
vote
1
answer
42
views
Is it possible to estimate the number of positives from precision and recall values?
Let's say, I have a binary predictor, and its performance in precision and recall is known from the previous study. Now, we apply the predictor on the new (unknown) dataset with 1000 samples, and got ...
1
vote
1
answer
127
views
Choosing the correct evaluation metric between F1-score and Area under the Precision-Recall Curve (AUPRC)
We're currently working on detecting specific objects (e.g. poultry farms, hospitals) from satellite images. We've modeled the problem as a binary image classification task (i.e. classifying images ...
3
votes
2
answers
467
views
Is F-score the same as accuracy when there are only two classes of equal size?
The title says it all: Is F-score the same as accuracy when there are only two classes of equal sizes?
For my specific case, I have measurements of a group of people under two different situations and ...
2
votes
1
answer
68
views
What is the best method to calculate confidence intervals on precision and recall which are not independent?
I have a classification model which classifies user's bank transactions into two categories. From this model I produce precision and recall metrics.
I would like to understand the confidence around ...
2
votes
1
answer
225
views
How to define Precision when we have multiple predictions for each ground truth instance?
In my problem, it is possible to have multiple predictions for a ground truth instance. How we define precision in such scenarios?
For further clarification consider the following example. We have 1 ...
1
vote
0
answers
23
views
Precision and Recall of a combined classifier
I have two classifiers, trained on the same dataset, each predicting a different variable. let's call these X1 and X2. They have their respective precision and recall measures X1-p,r, and X2-p,r.
I ...
0
votes
0
answers
28
views
Why would area under the PR curve include points off of the Pareto front?
(Let's set aside thoughts about if we should be calculating PR curves or areas under them at all.)
A precision-recall curve for a "classification" model can contain points that should not be ...
2
votes
1
answer
222
views
Comparing probability threshold graphs for F1 score for different models
Below are two plots, side-by side, for an imbalanced dataset.
We have a very large imbalanced dataset that we are processing/transforming in different manner. After each transformation, we run an ...
0
votes
0
answers
29
views
How to calculate AUC for a P-R curve with unusual starting point
I am working with a binary classifier that is outputting scores between 0 and 1, indicating probabilities of class membership, according to the model. I produced a P-R curve and the first point (i.e., ...
8
votes
1
answer
603
views
Lack of rigor when describing prediction metrics
I constantly see metrics that measure the quality of a classifier's predictions, such as TPR, FPR, Precision, etc., being described as probabilities (see Wikipedia, for example).
As far as I know, ...
0
votes
0
answers
22
views
Can the Log of PR AUC curve plot be any useful?
I was doing some tests regarding my PR curve for 2 different models (first image), and I got the idea of ploting the log of those curves (second image) to see if there were any insights that I could ...
1
vote
1
answer
152
views
Why does my PR Curve look like this?
These are my recall and precision stats for the model I built. The Curve does not look good where recall is 0. Not sure why there are so many points there. Can anyone help and explain why the curve ...
1
vote
1
answer
295
views
Should we use train, validation, or test data when creating PR/AUC curves to optimize the decision threshold?
It makes sense to me that we can use the ROC-AUC and PR-AP scores of the validation sets during CV to tune our model hyperparameter selection. And when reporting the models final performance, it makes ...
3
votes
2
answers
198
views
Calculate area under precision-recall curve from area under ROC curve and the prevalence
I am reading material that reports the area under a ROC curve. I am curious to know what the performance would be in precision-recall space. From the sensitivity and specificity values in the ROC ...
0
votes
0
answers
208
views
Is Hyperparameter Tuning for Maximized Recall a Bad Thing?
I have a somewhat theoretical question: I work in an area that requires a number of anomaly detection solutions. When we approach these problems, we cross-validate and for each fold, we oversample ...
2
votes
1
answer
340
views
Poor balanced accuracy and minority recall but perfect calibration of probabilities? Imbalanced dataset
I have a dataset with a class imbalance in favour of the positive class (85% occurence)
I'm getting a fantastically calibrated probabilities profile but balanced accuracy is 0.65 and minority recall ...
1
vote
0
answers
33
views
How to choose k for MAP@K?
Scenario: We want to evaluate our recommender system, which recommends items to potential customers when visiting a product detail page.
Here are actual relevant items:
...
2
votes
1
answer
668
views
Sudden drop to zero for precision recall curve
I am training a neural network classifier with 250k training samples and 54k validation samples. The output activation is sigmoid. I noticed a sudden drop in the precision for the very top probability ...
1
vote
0
answers
56
views
Confidence intervals for Object Detection metrics
I would like to come back on this "When do we require to calculate the confidence Interval?" since recently a reviewer asked me to provide confidence intervals for metrics regarding my work ...
1
vote
1
answer
522
views
Why use average_precision_score from sklearn? [duplicate]
I have precision and recall values and want to measure an estimator performance:
...
5
votes
2
answers
713
views
Understanding Precision Recall in business context
So, I know yet another Precision, Recall question which is asked umpteenth times now.
I wanted to ask some specific business related questions.
Imagine if you are building a classifier to predict ...
2
votes
1
answer
118
views
Binary classification metrics - Combining sensitivity and specificity?
The harmonic mean between precision and recall (F1 score) is a common metric to evaluate binary classification. It is useful because it strikes a balance between precision (FP) and recall (FN).
For ...
1
vote
0
answers
29
views
How to start GNN optimization to get higher precision?
I'm developing a GNN for missing links prediction following this blog post for PyG library. I'm using almost the same GNN with a different dataset. Altough my dataset is similar to the MovieLens ...
1
vote
2
answers
298
views
Are there any difference using scores or probabilities for roc_auc_score and precision_recall_curve functions?
I'm working with a GNN model for link prediction and using precision_recall_curve and roc_auc_score from the ...
3
votes
1
answer
47
views
Precision and recall reported in classification model
I have one question about the evaluation metrics of classification models. I see many people report the precision and recall value for their classification models. Do they choose a threshold to ...
1
vote
0
answers
389
views
Interpretation of area under the precision-recall curve
The area under the receiver-operator characteristic curve has a interpretation of how well the predictions of two categories are separated.
This post gives the area under the precision-recall curve as ...
2
votes
0
answers
40
views
Is there a way to effect the shape of precision-recall curve?
As long as I know, for both ROC and PR curves, the classifier performance is usually measured by the AUC. This might indicate that classifiers with equivalent performance might have different ROC/PR ...
1
vote
1
answer
49
views
combine specificity and
I am performing classification on an imbalanced dataset (70% negatives).
If a prediction is negative I take a specific action otherwise an opposite one. As in both cases some costs are implied, I want ...
1
vote
4
answers
239
views
Why don't we use the harmonic mean of sensitivity and specificity?
There is this question on the F-1 score, asking why we compute the harmonic mean of precision and recall rather than its arithmetic mean. There were good arguments in the answers in favor of the ...
4
votes
1
answer
518
views
When to use a ROC Curve vs. a Precision Recall Curve?
Looking for the circumstances of when we should use a ROC curve vs. a Precision Recall curve.
Example of answers I am looking for:
Use a ROC Curve when:
you have a balanced or imbalanced dataset (...
3
votes
1
answer
696
views
ROC AUC has $0.5$ as random performance. Does PR AUC have a similar notion?
In considering ROC AUC, there is a sense in which $0.5$ is the performance of a random model. Conveniently, this is true, no matter the data or the prior probability of class membership; the ROC AUC ...
11
votes
8
answers
9k
views
My machine learning model has precision of 30%. Can this model be useful?
I've encountered an interesting discussion at work on interpretation of precision (confusion matrix) within a machine learning model. The interpretation of precision is where there is a difference of ...
0
votes
0
answers
19
views
Flipping inputs in multilabel classification
I have framed a classification problem as follows:
I have $N$ items, and wish to predict a set of relevant tags for each out of $M$ tags. An item can have anywhere from 0 to $M$ applicable tags.
To ...
1
vote
2
answers
392
views
Confidence interval for Accuracy, Precision and Recall
Classification accuracy or classification error is a proportion or a ratio. It describes the proportion of correct or incorrect predictions made by the model. Each prediction is a binary decision that ...
1
vote
0
answers
80
views
A better linear model has less precision(relative to the worse model) at a larger threshold
I trained two models using the same algorithm - logistic regression (LogisticRegression(max_iter=180, C=1.05) for ~27 features and ~330K observations). I used the ...
0
votes
0
answers
100
views
Is weighting still needed when using undersampling?
I have an model that I want to test with my existing data to calculate precision, recall etc.
The data is actually unbalanced dataset: Class A 70%/Class B 30%.
I created a data set by undersampling ...
1
vote
0
answers
107
views
Accounting for overrepresentation of positives in binary classification test set for calculation of precision and recall
I have a binary classification task with highly imbalanced data, since the class to be detected (in the following referred to as the positives) is very rare.
For data limitation reasons my test set ...
0
votes
0
answers
28
views
Weighted precision & recall - With class weights vs oversampling
If weighted precision is calculated as
...
0
votes
0
answers
416
views
Comparing AUC-PR between groups with different baselines
So I know that the area under the precision-recall curve is often a more useful metric than AUROC when dealing with highly imbalanced datasets. However, while AUROC can easily be used to compare ...
0
votes
0
answers
43
views
Does the Precision-Recall AUC approach the ROC AUC as the data becomes balanced?
I am working on a Machine Learning classifier. It is a binary response and most predictor variables are categorical. I have several years of data and for some years, the response is imbalanced (more ...
1
vote
0
answers
61
views
F1 Score vs PR Curve
If I understood correctly, PR Curve it's just the mean of F1 score computed multiple times with different thresholds.
In the task of outlier detection those are two suggested metrics given the fact ...