Lack of rigor when describing prediction metrics

Question

I constantly see metrics that measure the quality of a classifier's predictions, such as TPR, FPR, Precision, etc., being described as probabilities (see Wikipedia, for example).

As far as I know, these are not probabilities but observations of a classifier's predictive behavior estimated during testing: for instance, they measure the rate of observed True Positives, False Positives, and so on.

Wouldn't it be more correct to distinguish these from the actual probabilities? That is, to say that by the Law of Large Numbers, these mesaurements converge to the probabilities. For instance, Precision (the measurement) converges to the Bayesian detection Rate (the actual probability).

There are numerous such statistics based on confusion matrices. Some of them are probabilities, and some are not. — Galen, Commented Jan 8 at 14:22

Dave · Accepted Answer · 2024-01-08 14:22:52Z

15

These strike me as perfectly reasonable estimates of the parameter of a Bernoulli distribution, which we typically interpret as a probability.

Probability that a classification will be correct (accuracy)

Probability that a positive case will be detected (sensitivity)

Probability that a case asserted to be positive really will turn out to be (precision)

If you want to say that these are just estimates of true probability values (the true Bernoulli parameters), that's fine but also not controversial. We should expect the performance in production to differ at least slightly from what we observe in development.

answered Jan 8 at 14:22

Dave

67.2k7 gold badges105 silver badges305 bronze badges

1

$\begingroup$ 🎯 Exactly (+1), some of the statistics computed from a confusion matrix are conditional probabilities. $\endgroup$
– Galen
Commented Jan 8 at 14:24
$\begingroup$ yes, I agree they are estimates of probabilities, but they are not the true probabilities. My point is precisely that the distinction is important but not sufficiently acknowledged. They say "TPR is the probability that..." when they should say "TPR is an estimate of the probability that..." $\endgroup$
– synack
Commented Jan 9 at 11:38
1

$\begingroup$ @synack: they are a true probability: the probability in the validation set. There is some laziness in not calling them estimates of the value of interest (probability of full population) or not always giving confidence intervals, but this is not the biggest statistical crime of the machine learning community :). Also in many cases, the validation set has millions of rows so the standard error is unlikely to actually be meaningful. $\endgroup$
– Cliff AB
Commented Jan 9 at 17:28
1

$\begingroup$ I've referred to such practices as "slang" a number of times and stand by doing so. $\endgroup$
– Dave
Commented Jan 9 at 17:31

Add a comment |

Stack Exchange Network

Lack of rigor when describing prediction metrics

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
probability
precision-recall
or ask your own question.

Hot Network Questions

Lack of rigor when describing prediction metrics

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged probabilityprecision-recall or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability
precision-recall
or ask your own question.