Average precision score too high looking at the confusion matrix

I am developing a machine learning scikit-learn model on an imbalanced dataset (binary classification). Looking at the confusion matrix and the F1 score, I expect a lower average precision score but I almost get a perfect score and I can't figure out why. This is the output I am getting:

Confusion matrix on the test set:

[[6792  199]
[   0  173]]

F1 score: 0.63

Test AVG precision score: 0.99

I am giving the avg precision score function of scikit-learn probabilities which is what the package says to use. I was wondering where the problem could be.

1 answer

  • answered 2022-05-02 18:39 Ben Reiniger

    The confusion matrix and f1 score are based on a hard prediction, which in sklearn is produced by cutting predictions at a probability threshold of 0.5 (for binary classification, and assuming the classifier is really probabilistic to begin with [so not SVM e.g.]). The average precision in contrast is computed using all possible probability thresholds; it can be read as the area under the precision-recall curve.

    So a high average_precision_score and low f1_score suggests that your model does extremely well at some threshold that is not 0.5.

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum