Skip to content

9.1. Classification

Below is a detailed list of metrics commonly used to evaluate the accuracy and performance of classification and regression models in machine learning, including neural networks. The metrics are categorized based on their applicability to classification or regression tasks, with explanations of their purpose and mathematical formulations where relevant.

Classification Metrics

Classification tasks involve predicting discrete class labels. The following metrics assess the accuracy and effectiveness of such models:

Metric Purpose Formula Use Case
Accuracy Measures the proportion of correct predictions across all classes \( \displaystyle \frac{TP + TN}{TP + TN + FP + FN} \) Suitable for balanced datasets but misleading for imbalanced ones
Precision Evaluates the proportion of positive predictions that are actually correct \( \displaystyle \frac{TP}{TP + FP} \) Important when false positives are costly (e.g., spam detection)
Recall (Sensitivity) Assesses the proportion of actual positives correctly identified \( \displaystyle \frac{TP}{TP + FN} \) Critical when false negatives are costly (e.g., disease detection)
F1-Score Harmonic mean of precision and recall, balancing both metrics \( \displaystyle 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \) Useful for imbalanced datasets where both precision and recall matter
AUC-ROC Measures the model’s ability to distinguish between classes across all thresholds Area under the curve plotting True Positive Rate (Recall) vs. False Positive Rate \( \displaystyle \left( \frac{FP}{FP + TN} \right) \) Effective for binary classification and assessing model robustness
AUC-PR Focuses on precision and recall trade-off, especially for imbalanced datasets Area under the curve plotting Precision vs. Recall Preferred when positive class is rare (e.g., fraud detection)
Confusion Matrix1 Provides a tabular summary of prediction outcomes (TP, TN, FP, FN) Offers detailed insights into class-specific performance, especially for multi-class problems
Hamming Loss Calculates the fraction of incorrect labels to the total number of labels \( \displaystyle \frac{1}{N} \sum_{i=1}^N \frac{1}{L} \sum_{j=1}^L \mathbf{1}(y_{ij} \neq \hat{y}_{ij}) \) Suitable for multi-label classification tasks
Balanced Accuracy Average of recall obtained on each class, useful for imbalanced datasets \( \displaystyle \frac{1}{C} \sum_{i=1}^C \frac{TP_i}{TP_i + FN_i} \) Effective for multi-class problems with class imbalance

Explanation of ROC Curve (AUC-ROC)

An ROC curve plots the True Positive Rate (TPR, or sensitivity/recall) against the False Positive Rate (FPR) at various classification thresholds. It helps visualize the trade-off between sensitivity and specificity for a classifier:

  • True Positive Rate (TPR): The proportion of actual positives correctly identified (TP / (TP + FN)).

  • False Positive Rate (FPR): The proportion of actual negatives incorrectly classified as positives (FP / (FP + TN)).

  • The Area Under the Curve (AUC) quantifies the overall performance, with AUC = 1 indicating a perfect classifier and AUC = 0.5 indicating a random classifier.

2025-09-10T13:40:10.954131 image/svg+xml Matplotlib v3.10.6, https://matplotlib.org/

Additional