4.2. Regression Metrics

Regression tasks predict continuous values. The following metrics evaluate the accuracy of predicted values against true values:

  1. Mean Absolute Error (MAE)
  2. Purpose: Measures the average absolute difference between predictions and true values.
  3. Formula: \( \text{MAE} = \frac{1}{N} \sum_{i=1}^N |y_i - \hat{y}_i| \)
    • \( y_i \): True value, \( \hat{y}_i \): Predicted value, \( N \): Number of samples.
  4. Use Case: Robust to outliers, interpretable as average error.

  5. Mean Squared Error (MSE)

  6. Purpose: Measures the average squared difference between predictions and true values.
  7. Formula: \( \text{MSE} = \frac{1}{N} \sum_{i=1}^N (y_i - \hat{y}_i)^2 \)
  8. Use Case: Sensitive to outliers, commonly used in neural network loss functions.

  9. Root Mean Squared Error (RMSE)

  10. Purpose: Square root of MSE, providing error in the same units as the target.
  11. Formula: \( \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^N (y_i - \hat{y}_i)^2} \)
  12. Use Case: Preferred for interpretable error magnitude, widely used in forecasting.

  13. Mean Absolute Percentage Error (MAPE)

  14. Purpose: Measures average percentage error relative to true values.
  15. Formula: \( \text{MAPE} = \frac{1}{N} \sum_{i=1}^N \left| \frac{y_i - \hat{y}_i}{y_i} \right| \cdot 100 \)
  16. Use Case: Useful when relative errors matter (e.g., financial predictions), but sensitive to zero or near-zero true values.

  17. R-Squared (Coefficient of Determination)

  18. Purpose: Measures the proportion of variance in the dependent variable explained by the model.
  19. Formula: \( R^2 = 1 - \frac{\sum_{i=1}^N (y_i - \hat{y}_i)^2}{\sum_{i=1}^N (y_i - \bar{y})^2} \)
    • \( \bar{y} \): Mean of true values.
  20. Use Case: Indicates model fit, with values closer to 1 indicating better fit.

  21. Adjusted R-Squared

  22. Purpose: Adjusts R² for the number of predictors, penalizing overly complex models.
  23. Formula: \( \text{Adjusted } R^2 = 1 - \left( \frac{(1 - R^2)(N - 1)}{N - k - 1} \right) \)
    • \( k \): Number of predictors.
  24. Use Case: Useful when comparing models with different numbers of features.

  25. Median Absolute Error

  26. Purpose: Measures the median of absolute differences, highly robust to outliers.
  27. Formula: \( \text{MedAE} = \text{median}(|y_1 - \hat{y}_1|, \dots, |y_N - \hat{y}_N|) \)
  28. Use Case: Preferred in datasets with extreme values or non-Gaussian errors.

  29. Huber Loss

  30. Purpose: Combines MSE and MAE, less sensitive to outliers than MSE.
  31. Formula: [ L_\delta(y_i, \hat{y}_i) = \begin{cases} \frac{1}{2}(y_i - \hat{y}_i)^2 & \text{if } |y_i - \hat{y}_i| \leq \delta \ \delta |y_i - \hat{y}_i| - \frac{1}{2}\delta^2 & \text{otherwise} \end{cases} ]
  32. Use Case: Used in robust regression tasks, often as a loss function in neural networks.