2
$\begingroup$

I have few queries on Precision and Recall in Classification of machine learning.

While I was reading, I found that high Precision will result low recall and vice versa.

  1. However if someone ask how much % of Precision is acceptable, answer could be?

  2. Is recall should/Can be greater than Precision? Or Precision should/can be greater than recall?

  3. To be consider Precision , should it be? 95% or if we consider recall should it be > 95%.

  4. In my results of test, I got 100% in recall, can recall or Precision be greater than 100?

$\endgroup$

2 Answers 2

1
$\begingroup$

To explain this, I would use an example. I trained a model that classifies bananas and not bananas. When I evaluate the model, I use 20 pieces of fruit, 10 bananas and 10 other fruits. 8 bananas were classified as bananas, and the other 2 as "other fruits". 7 pieces of fruit that aren't bananas were classified as "other fruits", and the other 3 as bananas.

If "banana" is our positive class and "other fruits" our negative class, we will have 8 True Positives, 2 False Negatives, 7 True Negatives and 3 False Positives.

Given this, we can calculate the precision as True Positives / (True Positives + False Positives), or in my own words, the proportion of the Predicted as Positive that are really Positive. The recall can be calculated as True Positives / (True Positives + False Negatives) or in my own words, the proportion of Positive samples that have been identified as Positive.

So we can now answer the 4th question, precision and recall can never be greater than 1 because the denominator of the equation will be equal or greater than the numerator always.

Question 1 is difficult to answer because it depends on the problem. In many cases, you will need a precision or recall greater and 0.9 to make sure that you are prediction correctly, but in other cases, we can accept lower values. This question doesn't have a unique answer.

Question 2, again, it depends. In some cases, you will want to have a lot of precision even if this means that the recall will be lower because you want that all your positive predictions must be positive (think in a system that predicts if a person has cancer, you don't want to give chemo to a healthy person). On the other hand, in some cases, you will want a high recall because it means that no positive sample will be classified as negative (think in a computer virus detector, it's better to classify a file as dangerous to be sure that no virus will infect our computer).

Question 3, same as the 2 previous questions.

$\endgroup$
0
$\begingroup$

The answer is very simple: it depends. Whatever metric you are using, they are always relative to your setting. If you are building a toy model that distinguishes cats from dogs on images downloaded from the web, you may be fine with misclassifying 20% of the dogs. On anther hand, if your business depends on it and you'd be loosing real money, 5% error could be unacceptable. If your model is going to automatically assign patients to treatments and people's' lives are going depend on it, then 0.1% misclassifications may be unacceptable, since 1 in 1000 patients will possibly die because of it.

See also the How to choose a confidence level? thread for similar arguments.

Regarding your fourth question, those metrics are proportions, so they are always between 0% and 100%. See the Wikipedia article on precission and recall to learn more on those metrics.

$\endgroup$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.