Skip to main content

Questions tagged [precision-recall]

P&R are a way to measure the relevance of set of retrieved instances. Precision is the % of correct instances out of all instances retrieved. Relevance is the % of true instances retrieved. The harmonic mean of P&R is the F1-score. P&R are used in data mining to evaluate classifiers.

Filter by
Sorted by
Tagged with
0 votes
0 answers
7 views

Drawbacks of stratified test set bootstrapping for metric UQ

I am using test-set (percentile) bootstrapping to quantify the uncertainty of various model performance metrics, such as AUROC, AUPR, etc. To avoid any confusion, the approach is simply: bootstrap ...
Eike P.'s user avatar
  • 3,098
0 votes
1 answer
35 views

Time-dependent area under the precision recall

How to compare the time-dependent precision recall (PR) receiver operating curve (ROC) values for two cox regression models at multiple time points? To compare two time-dependent AUC values, I would ...
obruzzi's user avatar
  • 101
0 votes
0 answers
19 views

Recursive Random Search and Categorical Cost Functions

I'm currently working on a project that involves optimizing the default Spark-submit configurations to minimize execution time. I've developed two models to aid in this process: Binary Classification ...
Hijaw's user avatar
  • 175
0 votes
0 answers
56 views

Average precision vs Average recall in object detection

There are two popular metrics for object detection: Average precision and Average recall. Do you can explain with examples, what are the cases to use AP, and what are the cases to use AR? I agree that ...
Ars ML's user avatar
  • 31
5 votes
3 answers
566 views

Judging a model through the TP, TN, FP, and FN values

I am evaluating a model that predicts the existence or not existence of a "characteristic" (for example, "there is a dog in this image") using several datasets. The system outputs ...
KansaiRobot's user avatar
1 vote
0 answers
38 views

How to estimate precision and recall without taking a huge random sample when the positive class is relatively rare

I have a binary text classification model, and I would like to test how well it works, in terms of precision and recall, on a new dataset of 2 million text documents that have not been annotated yet. ...
Alex's user avatar
  • 467
2 votes
1 answer
130 views

We have sensitivity-specificity space (ROC curves) and precision-recall space (PR curves, $F_1$ score). What work has been done with PPV-NPV space?

Receiver-operator characteristic (ROC) curves display the balance between sensitivity and specificity: how good you are at detecting category $1$ (sensitivity) while not falsely identifying category $...
Dave's user avatar
  • 67.2k
2 votes
0 answers
29 views

Re-calculate accuracy, precision and recall after treatment effect in a model

Working in a churn-prediction model where the goal is to detect the players that have a high chance to churn from the site and send those players an offer to keep them in site. In the initial training ...
ELTono's user avatar
  • 21
2 votes
3 answers
234 views

Is relying on just the confusion matrix for highly imbalanced test sets to evaluate model performance a bad idea?

I have a binary classification model with a test set that is highly skewed, the majority class 0 is 22 times greater than the minority class 1. This causes my Precision to be low and Recall to be high,...
statsnoob's user avatar
1 vote
1 answer
42 views

Is it possible to estimate the number of positives from precision and recall values?

Let's say, I have a binary predictor, and its performance in precision and recall is known from the previous study. Now, we apply the predictor on the new (unknown) dataset with 1000 samples, and got ...
ysakamoto's user avatar
  • 113
1 vote
1 answer
127 views

Choosing the correct evaluation metric between F1-score and Area under the Precision-Recall Curve (AUPRC)

We're currently working on detecting specific objects (e.g. poultry farms, hospitals) from satellite images. We've modeled the problem as a binary image classification task (i.e. classifying images ...
meraxes's user avatar
  • 739
3 votes
2 answers
467 views

Is F-score the same as accuracy when there are only two classes of equal size?

The title says it all: Is F-score the same as accuracy when there are only two classes of equal sizes? For my specific case, I have measurements of a group of people under two different situations and ...
user1596274's user avatar
2 votes
1 answer
68 views

What is the best method to calculate confidence intervals on precision and recall which are not independent?

I have a classification model which classifies user's bank transactions into two categories. From this model I produce precision and recall metrics. I would like to understand the confidence around ...
Barnaby Cooper's user avatar
2 votes
1 answer
225 views

How to define Precision when we have multiple predictions for each ground truth instance?

In my problem, it is possible to have multiple predictions for a ground truth instance. How we define precision in such scenarios? For further clarification consider the following example. We have 1 ...
Meysam Sadeghi's user avatar
1 vote
0 answers
23 views

Precision and Recall of a combined classifier

I have two classifiers, trained on the same dataset, each predicting a different variable. let's call these X1 and X2. They have their respective precision and recall measures X1-p,r, and X2-p,r. I ...
Bi Act's user avatar
  • 123
0 votes
0 answers
28 views

Why would area under the PR curve include points off of the Pareto front?

(Let's set aside thoughts about if we should be calculating PR curves or areas under them at all.) A precision-recall curve for a "classification" model can contain points that should not be ...
Dave's user avatar
  • 67.2k
2 votes
1 answer
222 views

Comparing probability threshold graphs for F1 score for different models

Below are two plots, side-by side, for an imbalanced dataset. We have a very large imbalanced dataset that we are processing/transforming in different manner. After each transformation, we run an ...
Ashok K Harnal's user avatar
0 votes
0 answers
29 views

How to calculate AUC for a P-R curve with unusual starting point

I am working with a binary classifier that is outputting scores between 0 and 1, indicating probabilities of class membership, according to the model. I produced a P-R curve and the first point (i.e., ...
CopyOfA's user avatar
  • 187
8 votes
1 answer
603 views

Lack of rigor when describing prediction metrics

I constantly see metrics that measure the quality of a classifier's predictions, such as TPR, FPR, Precision, etc., being described as probabilities (see Wikipedia, for example). As far as I know, ...
synack's user avatar
  • 371
0 votes
0 answers
22 views

Can the Log of PR AUC curve plot be any useful?

I was doing some tests regarding my PR curve for 2 different models (first image), and I got the idea of ploting the log of those curves (second image) to see if there were any insights that I could ...
GabrielPast's user avatar
1 vote
1 answer
152 views

Why does my PR Curve look like this?

These are my recall and precision stats for the model I built. The Curve does not look good where recall is 0. Not sure why there are so many points there. Can anyone help and explain why the curve ...
ibarbo's user avatar
  • 65
1 vote
1 answer
295 views

Should we use train, validation, or test data when creating PR/AUC curves to optimize the decision threshold?

It makes sense to me that we can use the ROC-AUC and PR-AP scores of the validation sets during CV to tune our model hyperparameter selection. And when reporting the models final performance, it makes ...
another_student's user avatar
3 votes
2 answers
198 views

Calculate area under precision-recall curve from area under ROC curve and the prevalence

I am reading material that reports the area under a ROC curve. I am curious to know what the performance would be in precision-recall space. From the sensitivity and specificity values in the ROC ...
Dave's user avatar
  • 67.2k
0 votes
0 answers
208 views

Is Hyperparameter Tuning for Maximized Recall a Bad Thing?

I have a somewhat theoretical question: I work in an area that requires a number of anomaly detection solutions. When we approach these problems, we cross-validate and for each fold, we oversample ...
Branden Keck's user avatar
2 votes
1 answer
340 views

Poor balanced accuracy and minority recall but perfect calibration of probabilities? Imbalanced dataset

I have a dataset with a class imbalance in favour of the positive class (85% occurence) I'm getting a fantastically calibrated probabilities profile but balanced accuracy is 0.65 and minority recall ...
Kat's user avatar
  • 21
1 vote
0 answers
33 views

How to choose k for MAP@K?

Scenario: We want to evaluate our recommender system, which recommends items to potential customers when visiting a product detail page. Here are actual relevant items: ...
etang's user avatar
  • 1,027
2 votes
1 answer
668 views

Sudden drop to zero for precision recall curve

I am training a neural network classifier with 250k training samples and 54k validation samples. The output activation is sigmoid. I noticed a sudden drop in the precision for the very top probability ...
Florian's user avatar
  • 21
1 vote
0 answers
56 views

Confidence intervals for Object Detection metrics

I would like to come back on this "When do we require to calculate the confidence Interval?" since recently a reviewer asked me to provide confidence intervals for metrics regarding my work ...
rok's user avatar
  • 111
1 vote
1 answer
522 views

Why use average_precision_score from sklearn? [duplicate]

I have precision and recall values and want to measure an estimator performance: ...
Ars ML's user avatar
  • 31
5 votes
2 answers
713 views

Understanding Precision Recall in business context

So, I know yet another Precision, Recall question which is asked umpteenth times now. I wanted to ask some specific business related questions. Imagine if you are building a classifier to predict ...
Baktaawar's user avatar
  • 1,115
2 votes
1 answer
118 views

Binary classification metrics - Combining sensitivity and specificity?

The harmonic mean between precision and recall (F1 score) is a common metric to evaluate binary classification. It is useful because it strikes a balance between precision (FP) and recall (FN). For ...
usual me's user avatar
  • 1,257
1 vote
0 answers
29 views

How to start GNN optimization to get higher precision?

I'm developing a GNN for missing links prediction following this blog post for PyG library. I'm using almost the same GNN with a different dataset. Altough my dataset is similar to the MovieLens ...
James's user avatar
  • 135
1 vote
2 answers
298 views

Are there any difference using scores or probabilities for roc_auc_score and precision_recall_curve functions?

I'm working with a GNN model for link prediction and using precision_recall_curve and roc_auc_score from the ...
James's user avatar
  • 135
3 votes
1 answer
47 views

Precision and recall reported in classification model

I have one question about the evaluation metrics of classification models. I see many people report the precision and recall value for their classification models. Do they choose a threshold to ...
Salty Gold Fish's user avatar
1 vote
0 answers
389 views

Interpretation of area under the precision-recall curve

The area under the receiver-operator characteristic curve has a interpretation of how well the predictions of two categories are separated. This post gives the area under the precision-recall curve as ...
Dave's user avatar
  • 67.2k
2 votes
0 answers
40 views

Is there a way to effect the shape of precision-recall curve?

As long as I know, for both ROC and PR curves, the classifier performance is usually measured by the AUC. This might indicate that classifiers with equivalent performance might have different ROC/PR ...
Gideon Kogan's user avatar
1 vote
1 answer
49 views

combine specificity and

I am performing classification on an imbalanced dataset (70% negatives). If a prediction is negative I take a specific action otherwise an opposite one. As in both cases some costs are implied, I want ...
shamalaia's user avatar
  • 295
1 vote
4 answers
239 views

Why don't we use the harmonic mean of sensitivity and specificity?

There is this question on the F-1 score, asking why we compute the harmonic mean of precision and recall rather than its arithmetic mean. There were good arguments in the answers in favor of the ...
user209974's user avatar
4 votes
1 answer
518 views

When to use a ROC Curve vs. a Precision Recall Curve?

Looking for the circumstances of when we should use a ROC curve vs. a Precision Recall curve. Example of answers I am looking for: Use a ROC Curve when: you have a balanced or imbalanced dataset (...
Katsu's user avatar
  • 1,021
3 votes
1 answer
696 views

ROC AUC has $0.5$ as random performance. Does PR AUC have a similar notion?

In considering ROC AUC, there is a sense in which $0.5$ is the performance of a random model. Conveniently, this is true, no matter the data or the prior probability of class membership; the ROC AUC ...
Dave's user avatar
  • 67.2k
11 votes
8 answers
9k views

My machine learning model has precision of 30%. Can this model be useful?

I've encountered an interesting discussion at work on interpretation of precision (confusion matrix) within a machine learning model. The interpretation of precision is where there is a difference of ...
wmmwmm's user avatar
  • 121
0 votes
0 answers
19 views

Flipping inputs in multilabel classification

I have framed a classification problem as follows: I have $N$ items, and wish to predict a set of relevant tags for each out of $M$ tags. An item can have anywhere from 0 to $M$ applicable tags. To ...
John's user avatar
  • 1
1 vote
2 answers
392 views

Confidence interval for Accuracy, Precision and Recall

Classification accuracy or classification error is a proportion or a ratio. It describes the proportion of correct or incorrect predictions made by the model. Each prediction is a binary decision that ...
dokondr's user avatar
  • 287
1 vote
0 answers
80 views

A better linear model has less precision(relative to the worse model) at a larger threshold

I trained two models using the same algorithm - logistic regression (LogisticRegression(max_iter=180, C=1.05) for ~27 features and ~330K observations). I used the ...
konstantin_doncov's user avatar
0 votes
0 answers
100 views

Is weighting still needed when using undersampling?

I have an model that I want to test with my existing data to calculate precision, recall etc. The data is actually unbalanced dataset: Class A 70%/Class B 30%. I created a data set by undersampling ...
Kar781Lopsdsds's user avatar
1 vote
0 answers
107 views

Accounting for overrepresentation of positives in binary classification test set for calculation of precision and recall

I have a binary classification task with highly imbalanced data, since the class to be detected (in the following referred to as the positives) is very rare. For data limitation reasons my test set ...
user15774062's user avatar
0 votes
0 answers
28 views

Weighted precision & recall - With class weights vs oversampling

If weighted precision is calculated as ...
Kaushik J's user avatar
  • 101
0 votes
0 answers
416 views

Comparing AUC-PR between groups with different baselines

So I know that the area under the precision-recall curve is often a more useful metric than AUROC when dealing with highly imbalanced datasets. However, while AUROC can easily be used to compare ...
Eike P.'s user avatar
  • 3,098
0 votes
0 answers
43 views

Does the Precision-Recall AUC approach the ROC AUC as the data becomes balanced?

I am working on a Machine Learning classifier. It is a binary response and most predictor variables are categorical. I have several years of data and for some years, the response is imbalanced (more ...
wisamb's user avatar
  • 161
1 vote
0 answers
61 views

F1 Score vs PR Curve

If I understood correctly, PR Curve it's just the mean of F1 score computed multiple times with different thresholds. In the task of outlier detection those are two suggested metrics given the fact ...
Loris's user avatar
  • 23

1
2 3 4 5
10