9.1. Classification

Below is a detailed list of metrics commonly used to evaluate the accuracy and performance of classification and regression models in machine learning, including neural networks. The metrics are categorized based on their applicability to classification or regression tasks, with explanations of their purpose and mathematical formulations where relevant.

Classification Metrics

Classification tasks involve predicting discrete class labels. The following metrics assess the accuracy and effectiveness of such models:

Metric	Purpose	Formula	Use Case
Accuracy	Measures the proportion of correct predictions across all classes	\( \displaystyle \frac{TP + TN}{TP + TN + FP + FN} \)	Suitable for balanced datasets but misleading for imbalanced ones
Precision	Evaluates the proportion of positive predictions that are actually correct	\( \displaystyle \frac{TP}{TP + FP} \)	Important when false positives are costly (e.g., spam detection)
Recall (Sensitivity)	Assesses the proportion of actual positives correctly identified	\( \displaystyle \frac{TP}{TP + FN} \)	Critical when false negatives are costly (e.g., disease detection)
F1-Score	Harmonic mean of precision and recall, balancing both metrics	\( \displaystyle 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} \)	Useful for imbalanced datasets where both precision and recall matter
AUC-ROC	Measures the model’s ability to distinguish between classes across all thresholds	Area under the curve plotting True Positive Rate (Recall) vs. False Positive Rate \( \displaystyle \left( \frac{FP}{FP + TN} \right) \)	Effective for binary classification and assessing model robustness
AUC-PR	Focuses on precision and recall trade-off, especially for imbalanced datasets	Area under the curve plotting Precision vs. Recall	Preferred when positive class is rare (e.g., fraud detection)
Confusion Matrix¹	Provides a tabular summary of prediction outcomes (TP, TN, FP, FN)		Offers detailed insights into class-specific performance, especially for multi-class problems
Hamming Loss	Calculates the fraction of incorrect labels to the total number of labels	\( \displaystyle \frac{1}{N} \sum_{i=1}^N \frac{1}{L} \sum_{j=1}^L \mathbf{1}(y_{ij} \neq \hat{y}_{ij}) \)	Suitable for multi-label classification tasks
Balanced Accuracy	Average of recall obtained on each class, useful for imbalanced datasets	\( \displaystyle \frac{1}{C} \sum_{i=1}^C \frac{TP_i}{TP_i + FN_i} \)	Effective for multi-class problems with class imbalance

Explanation of ROC Curve (AUC-ROC)

An ROC curve plots the True Positive Rate (TPR, or sensitivity/recall) against the False Positive Rate (FPR) at various classification thresholds. It helps visualize the trade-off between sensitivity and specificity for a classifier:

True Positive Rate (TPR): The proportion of actual positives correctly identified (TP / (TP + FN)).
False Positive Rate (FPR): The proportion of actual negatives incorrectly classified as positives (FP / (FP + TN)).
The Area Under the Curve (AUC) quantifies the overall performance, with AUC = 1 indicating a perfect classifier and AUC = 0.5 indicating a random classifier.

Additional

Confusion Matrix ↩