Confusion Matrix

Python ka Chilla for Data Science (40 Days of Python for Data Science)

About Lesson

A confusion matrix is a table that is often used to summarize the performance of a classification model (or classifier) on a set of test data for which the true values are known.

It allows visualization of the performance of the algorithm by comparing predicted labels with actual labels. The matrix will show:

True Positives (TP) – Examples that were predicted positive and are actually positive.
True Negatives (TN) – Examples that were predicted negative and are actually negative.
False Positives (FP) – Examples that were predicted positive but are actually negative. Also known as ‘Type I error’.
False Negatives (FN) – Examples that were predicted negative but are actually positive. Also known as ‘Type II error’.

From these, key metrics like:

Accuracy = (TP + TN) / Total
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 Score = 2 * (Precision * Recall) / (Precision + Recall)

Can be directly calculated.

The confusion matrix thus provides an overview of classifier performance and the types of errors being made. It is a useful tool for model evaluation, comparison and identification of bias/flaws.

Join the conversation