What is a Confusion Matrix?
What are they, why, and when should we use them?
Introduction
In machine learning, evaluating the performance of a model is just as important as developing the model itself. The most popular approach data scientists implement when evaluating a model's performance is a confusion matrix.
But what exactly is a confusion matrix, and why is it so widely used? In this article, we’ll break down the concept, explain its components, and show you when and how to use it.
What Is a Confusion Matrix?
A confusion matrix is a model performance measurement tool for machine learning classification problems. Its output compares model predictions against true labels, creating a visual representation of the number of correct and incorrect predictions, categorized by each class in the dataset.
This matrix is especially beneficial when tackling binary classification problems (e.g. “spam emails” vs. “not spam emails”), but can also be implemented when tackling multi-class problems.
Confusion Matrix Structure
The confusion matrix is typically a 2x2 table for binary classification, structured like the following: