8
$\begingroup$

I constantly see metrics that measure the quality of a classifier's predictions, such as TPR, FPR, Precision, etc., being described as probabilities (see Wikipedia, for example).

As far as I know, these are not probabilities but observations of a classifier's predictive behavior estimated during testing: for instance, they measure the rate of observed True Positives, False Positives, and so on.

Wouldn't it be more correct to distinguish these from the actual probabilities? That is, to say that by the Law of Large Numbers, these mesaurements converge to the probabilities. For instance, Precision (the measurement) converges to the Bayesian detection Rate (the actual probability).

$\endgroup$
1
  • $\begingroup$ There are numerous such statistics based on confusion matrices. Some of them are probabilities, and some are not. $\endgroup$
    – Galen
    Commented Jan 8 at 14:22

1 Answer 1

15
$\begingroup$

These strike me as perfectly reasonable estimates of the parameter of a Bernoulli distribution, which we typically interpret as a probability.

Probability that a classification will be correct (accuracy)

Probability that a positive case will be detected (sensitivity)

Probability that a case asserted to be positive really will turn out to be (precision)

If you want to say that these are just estimates of true probability values (the true Bernoulli parameters), that's fine but also not controversial. We should expect the performance in production to differ at least slightly from what we observe in development.

$\endgroup$
4
  • 1
    $\begingroup$ 🎯 Exactly (+1), some of the statistics computed from a confusion matrix are conditional probabilities. $\endgroup$
    – Galen
    Commented Jan 8 at 14:24
  • $\begingroup$ yes, I agree they are estimates of probabilities, but they are not the true probabilities. My point is precisely that the distinction is important but not sufficiently acknowledged. They say "TPR is the probability that..." when they should say "TPR is an estimate of the probability that..." $\endgroup$
    – synack
    Commented Jan 9 at 11:38
  • 1
    $\begingroup$ @synack: they are a true probability: the probability in the validation set. There is some laziness in not calling them estimates of the value of interest (probability of full population) or not always giving confidence intervals, but this is not the biggest statistical crime of the machine learning community :). Also in many cases, the validation set has millions of rows so the standard error is unlikely to actually be meaningful. $\endgroup$
    – Cliff AB
    Commented Jan 9 at 17:28
  • 1
    $\begingroup$ I've referred to such practices as "slang" a number of times and stand by doing so. $\endgroup$
    – Dave
    Commented Jan 9 at 17:31

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.