Welcome! Today, we are peeling back the layers of classification metrics, notably the confusion matrix, precision, and recall. This lesson delves into their theory and provides a practical illustration in Python.
The performance of binary classifiers is evaluated by comparing predicted and actual values; this structure is encoded as a confusion matrix. A confusion matrix produces four outcomes:
- True Positive (TP): Correct positive prediction.
- True Negative (TN): Correct negative prediction.
- False Positive (FP): Incorrect positive prediction.
- False Negative (FN): Incorrect negative prediction.
Consider an email spam filter, classifying Spam
(positive) and Not Spam
(negative) as follows:
The simplest way to measure the model's performance is to calculate its accuracy, simply the percentage of the correct predictions.
Accuracy measures total correct guesses, but it can’t tell the difference between certain errors. If you're often wrong about specific things, accuracy won't show it, which is a problem for particular tasks. For example, in medical tests, we want to minimize the amount of incorrect negative predictions () so we don't let the disease slip away in the early stages.
