So, I know yet another Precision, Recall question which is asked umpteenth times now.
I wanted to ask some specific business related questions.
Imagine if you are building a classifier to predict which patient who are diagnosed with certain disease are most likely to going to become eligible for treatment in next 3 months.
You build a binary classifier.
Here's the numbers u get for model in production:
Precision : 0.0032 or 0.32%
Recall : 50% or 0.5
Same stats during development:
Precision: 24% or 0.24
Recall: 68% or 0.68
My Question:
- How good is this model?
From what I understand if a model has such a low precision close to zero, then it means model is basically predicting all false positives. Meaning it is telling that lot of patients will become eligible for treatment.
But the Recall of 50% should mean anything? I believe since FP is so high, any hit rate is mostly due to chance since model is calling many patients as positive.
So, is it right to say, this model is of no good and Recall of 50% actually doesn't mean model picked out 50% of predictions correct.
Am I thinking on the right track or what am I missing?
I believe the data is highly skewed also
Scenario where the above can happen:
Have a patient pool of 10k. Model predicts 1k as Positive or eligible for treatment. Total actual eligible (TP) = 6. Model gets 3 of them. So Precision = 3/1000 = 0.003.
While Recall = 3/3+3(FN)= 0.5