Skip to main content

All Questions

Filter by
Sorted by
Tagged with
1 vote
0 answers
38 views

How to estimate precision and recall without taking a huge random sample when the positive class is relatively rare

I have a binary text classification model, and I would like to test how well it works, in terms of precision and recall, on a new dataset of 2 million text documents that have not been annotated yet. ...
Alex's user avatar
  • 467
2 votes
1 answer
130 views

We have sensitivity-specificity space (ROC curves) and precision-recall space (PR curves, $F_1$ score). What work has been done with PPV-NPV space?

Receiver-operator characteristic (ROC) curves display the balance between sensitivity and specificity: how good you are at detecting category $1$ (sensitivity) while not falsely identifying category $...
Dave's user avatar
  • 67.2k
2 votes
0 answers
29 views

Re-calculate accuracy, precision and recall after treatment effect in a model

Working in a churn-prediction model where the goal is to detect the players that have a high chance to churn from the site and send those players an offer to keep them in site. In the initial training ...
ELTono's user avatar
  • 21
1 vote
1 answer
127 views

Choosing the correct evaluation metric between F1-score and Area under the Precision-Recall Curve (AUPRC)

We're currently working on detecting specific objects (e.g. poultry farms, hospitals) from satellite images. We've modeled the problem as a binary image classification task (i.e. classifying images ...
meraxes's user avatar
  • 739
2 votes
1 answer
225 views

How to define Precision when we have multiple predictions for each ground truth instance?

In my problem, it is possible to have multiple predictions for a ground truth instance. How we define precision in such scenarios? For further clarification consider the following example. We have 1 ...
Meysam Sadeghi's user avatar
0 votes
0 answers
28 views

Why would area under the PR curve include points off of the Pareto front?

(Let's set aside thoughts about if we should be calculating PR curves or areas under them at all.) A precision-recall curve for a "classification" model can contain points that should not be ...
Dave's user avatar
  • 67.2k
0 votes
0 answers
29 views

How to calculate AUC for a P-R curve with unusual starting point

I am working with a binary classifier that is outputting scores between 0 and 1, indicating probabilities of class membership, according to the model. I produced a P-R curve and the first point (i.e., ...
CopyOfA's user avatar
  • 187
1 vote
1 answer
152 views

Why does my PR Curve look like this?

These are my recall and precision stats for the model I built. The Curve does not look good where recall is 0. Not sure why there are so many points there. Can anyone help and explain why the curve ...
ibarbo's user avatar
  • 65
5 votes
2 answers
713 views

Understanding Precision Recall in business context

So, I know yet another Precision, Recall question which is asked umpteenth times now. I wanted to ask some specific business related questions. Imagine if you are building a classifier to predict ...
Baktaawar's user avatar
  • 1,115
3 votes
1 answer
47 views

Precision and recall reported in classification model

I have one question about the evaluation metrics of classification models. I see many people report the precision and recall value for their classification models. Do they choose a threshold to ...
Salty Gold Fish's user avatar
3 votes
1 answer
696 views

ROC AUC has $0.5$ as random performance. Does PR AUC have a similar notion?

In considering ROC AUC, there is a sense in which $0.5$ is the performance of a random model. Conveniently, this is true, no matter the data or the prior probability of class membership; the ROC AUC ...
Dave's user avatar
  • 67.2k
11 votes
8 answers
9k views

My machine learning model has precision of 30%. Can this model be useful?

I've encountered an interesting discussion at work on interpretation of precision (confusion matrix) within a machine learning model. The interpretation of precision is where there is a difference of ...
wmmwmm's user avatar
  • 121
0 votes
0 answers
19 views

Flipping inputs in multilabel classification

I have framed a classification problem as follows: I have $N$ items, and wish to predict a set of relevant tags for each out of $M$ tags. An item can have anywhere from 0 to $M$ applicable tags. To ...
John's user avatar
  • 1
1 vote
0 answers
80 views

A better linear model has less precision(relative to the worse model) at a larger threshold

I trained two models using the same algorithm - logistic regression (LogisticRegression(max_iter=180, C=1.05) for ~27 features and ~330K observations). I used the ...
konstantin_doncov's user avatar
1 vote
0 answers
107 views

Accounting for overrepresentation of positives in binary classification test set for calculation of precision and recall

I have a binary classification task with highly imbalanced data, since the class to be detected (in the following referred to as the positives) is very rare. For data limitation reasons my test set ...
user15774062's user avatar
3 votes
1 answer
225 views

Singular beta in the F-beta vs. threshold score?

Consider this plot of the $F_\beta$ score for different values of $\beta$. I have a hard time getting an intuition as to why they intersect at a same point. (Cf. this blog post.) In other words, why ...
Tfovid's user avatar
  • 805
2 votes
2 answers
1k views

How to get the threshold from PrecisionRecallDisplay?

My goal is to tune the Classifier with probability predict_proba() < threshold. Therefore, I need to get the threshold. The problem is ...
Jason Rich Darmawan's user avatar
3 votes
3 answers
194 views

Isn't partial AUC a better metric than AUC for cost-sensitive classification problems?

In many classification problems, the cost of a FP is different from the cost of a FN. In spam detection, a FP (a regular email classified as spam) should have a high cost. In cancer prediction, a FN (...
usual me's user avatar
  • 1,257
0 votes
1 answer
421 views

How can area under ROC (AUC) be bad when precision, recall, and accuracy are all good?

I have a model with the following scores: Precision: 0.703588 Recall: 0.976526 Accuracy: 0.694936 I thought this was fairly decent, especially considering that my (binary) response class is 1/3 of the ...
NaiveBae's user avatar
  • 257
1 vote
1 answer
132 views

Reporting performance measures for classification in percentage or fraction?

I have seen classification metrics like f1-score, precision and recall being reported both as fractions and percentages. These measures are between 0 and ...
Jed Noise's user avatar
4 votes
1 answer
562 views

Measures to compare classification partitions

What are the most used measures (coefficients) to compare two partitions of objects into classes? I am speaking of validating the results of classification, not of clustering; the measures known as ...
ttnphns's user avatar
  • 58.8k
0 votes
1 answer
667 views

F2 score or the Area under the Precision-Recall-Curve as a scoring metric

I have a dataset with which I want to perform binary classification. The distribution of the target class is imbalanced: 20% positive labels, 80% negative labels. The positive class is more important ...
Daniel's user avatar
  • 145
2 votes
1 answer
62 views

A metric for a big/medium/small ML classification

I am working on an ML classification task which is similar to the following: Apples have to be classified to three classes: Big, Medium and Small. I need a metric which I can use to assess the system. ...
Alexey's user avatar
  • 123
0 votes
0 answers
88 views

Interpret precision and recall curve

I am evaluating a classification model and I am using the PR curve because of my highly imbalanced dataset (negative class is the 5%). In the end I'm comparing the training PR curve and the test PR ...
alesechi's user avatar
2 votes
1 answer
215 views

Match number of positives in unbalanced data set

I am dealing with a very unbalanced binary classification problem: 1% positives, 99% negatives. Training set is around 10 million rows, 40 columns. I choose the decision threshold (cutoff) on the ...
user623949's user avatar
3 votes
1 answer
178 views

When comparing classifiers on different datasets with different prevalences, is it valid to calculate prevalence-adjusted PPV?

Scenario: comparison of 2 different binary classifiers Both classifiers report sensitivity and specificity and number actually positive (P), but classifier 1 is tested on a dataset with prevalence 20%,...
sideburns28's user avatar
2 votes
0 answers
467 views

Why does precision_recall_curve() return similar but not equal values than confusion matrix?

INTRO: I wrote a very simple machine learning project which classifies numbers based on the minst dataset: ...
Federico Gentile's user avatar
0 votes
1 answer
474 views

How to practically calculate the accuracy of each class in muliclass classification problem?

I have the following confusion matrix: ...
Federico Gentile's user avatar
1 vote
1 answer
918 views

Interpreting NaN values for precision in Confusion Matrix

Please refer to the confusion matrix here: https://i.sstatic.net/Yxh5V.jpg Would I get precision values of NaN because of 0/0 in the right most columns? Is that even possible? How should I interpret ...
User_13's user avatar
  • 49
2 votes
1 answer
1k views

How to calculate F1, Precision, and Recall for Multi-Label Multi-Classification

I have a predictive model as follows Sample1 Sample2 Sample3 Sample4 Red Yellow Blue Green White Black Orange 65 21 55 40 0 0 1 0 1 0 0 31 40 44 30 0 0 0 0 0 0 0 33 44 56 66 1 0 0 1 0 0 1 63 77 ...
asmgx's user avatar
  • 311
0 votes
0 answers
23 views

Custom metric or already exist an metric to this problem? Accuracy by ID

at my company we are working with the "recall" metric in a specific problem, however this metric do not reflect the results that we would like to achieve. Take a look at below table. We ...
L. Costa's user avatar
1 vote
0 answers
2k views

Precision-Recall curves with multiclass classifier

I would like to plot the PR curves for a multiclass classifier (e.g. 3 classes). In the documentation it states that multiclass is not supported, and instead a series of one vs all classifiers are ...
John S's user avatar
  • 11
1 vote
2 answers
1k views

High Precision and High Recall issue- Random Forest Classification

I am building a classification model using Random Forest technique using GridSearchCV. The target variable is binary where 1 is 7.5% of total population. I have used several values of GridSearch ...
totalsurfer_v1's user avatar
1 vote
1 answer
364 views

Why is it called Sensitivity/Recall and Specificity?

Where do the terms: Sensitivity, Recall and Specificity come from historically? I've been looking for an answer for quite some time but to no avail. I understand the formulae and what they mean but I ...
Metrician's user avatar
  • 279
0 votes
1 answer
202 views

can reversing a very inaccurate binary classifier give more accurate predictions?

I was wondering if somehow I get a model for binary classification with let us say a 0.30 accuracy, does this means that if I reverse the outcome of the model (ie. swapping 0s and 1s in the ...
abdelgha4's user avatar
  • 123
4 votes
1 answer
439 views

Measuring Precision/Recall on a biased sample

I am working with ML models that predict e.g. whether an email violates some corporate policy or not. In this case, the "positives" are emails that violate the policy, and the number of ...
Frank's user avatar
  • 1,706
1 vote
0 answers
38 views

Help with understanding metrics for imbalanced classification

I am trying to train a neural network to classify chest X-ray scans as my final MSc project. I have a dataset of 13808 image, 3616 labelled COVID, 10192 labelled normal, so the ratio of COVID to ...
ParkTheMonkey's user avatar
4 votes
1 answer
317 views

Why use harmonic mean for precision and recall (f1 score) instead of just the product of precision and recall?

General question here, I understand the purpose of using the harmonic mean to generate the f1 score for model evaluation. I'm not exactly sure though why we don't just take the product of precision ...
Talysin's user avatar
  • 243
3 votes
1 answer
5k views

On which set (train/val/test) do people calculate F1 score, precision and recall?

This may be a stupid question, but when I was looking at the definition of precision/recall etc. it was not mentioned anywhere which set (training/validation/test) this metric should be calculated ...
Curaçao Hajek's user avatar
0 votes
1 answer
703 views

Multiclassification: precision-recall from scratch vs sklearn

I would like to know if there´s any issue behind using sklearn's precision/recall metric functions and coding up from scratch in ...
super_ask's user avatar
  • 225
4 votes
1 answer
2k views

Accuracy always equal to recall

Fitting 3 different models on a 5-class imbalanced dataset. The results show model accuracy always being equal to the recall. How can this be possible? ...
super_ask's user avatar
  • 225
4 votes
1 answer
3k views

Dip in Precision / Recall Curve

I have the following precision / recall curve. I am not sure why there would be a dip and then growth of precision, I would expect it to step down as the classifier was loosened. If anyone could shed ...
dendog's user avatar
  • 212
1 vote
0 answers
39 views

Incorporating per-class-accuracy penalties into Deep Learning

Typically when training Neural Networks for image classification, many people use SGD with weight-decay as a penalty term. The loss that is minimized corresponds to the misclassification and state of ...
Doc's user avatar
  • 145
0 votes
0 answers
65 views

Low model performance for an imbalanced data, is there any hope to improve the metrics?

I am working with an imbalanced data: 70k:0 and 1K:1 with 12 features. I would like to perform classification to choose the important features. So far, I have done under-sampling, over-sampling, ...
ricecooker's user avatar
2 votes
1 answer
2k views

Is it possible to get low AUC score but high Precision and Recall?

I am doing classification on a fairly imbalanced dataset (about 1:2 ratio). I have so far so far tried lasso and logistic regression. I didn't downsample the dataset because the sample size is low (...
Mohamad Sahil's user avatar
1 vote
0 answers
60 views

How to interpret the Precision and Recall curve in-sample vs out-of-sample

I have an imbalanced binary classification problem. After all the preprocessing (scaling, feature selection), I am going through an hyperparameter optmimisation using GridSearchCV to find the best ...
Luigi87's user avatar
  • 213
0 votes
3 answers
1k views

Is it ok a threshold of 0?

I am dealing with a classification problem with a dataset containing 60k rows: 69k are negative class, and 1k is positive. I trained my models and I obtained the confusion matrices with a threshold of ...
CasellaJr's user avatar
  • 123
2 votes
2 answers
2k views

Weighting common performance metrics by classification outcomes?

Cost-sensitive classification metrics are somewhat common (whereby correctly predicted items are weighted to 0 and misclassified outcomes are weighted according to their specific cost). Some examples ...
Bryan Shalloway's user avatar
1 vote
0 answers
280 views

Precision-Recall Curve and Area under Precision-Recall Curve (AUC)

I created model (logistic regression) and now trying to create Precision-Recall plot and calculate area under Precision-Recall Plot. I'd like to note that this model is defective: ...
Helios's user avatar
  • 105
7 votes
4 answers
1k views

Why we use precision/recall in binary classification but sensitivity(=recall)/specificity in medicine?

Sensitivity=recall is used in both fields, but the second metric is different. Why? Both tasks (classification and medicine) look same - data has two classes and we do some predictions on it and want ...
sitems's user avatar
  • 3,979