from dataidea.datasets import loadDataset
= loadDataset('vgsales') vg
Regression Metrics
In regression tasks, the goal is to predict continuous numerical values. Scikit-learn provides several metrics to evaluate the performance of regression models.
'Publisher', 'Genre']] vg[[
Publisher | Genre | |
---|---|---|
0 | Nintendo | Sports |
1 | Nintendo | Platform |
2 | Nintendo | Racing |
3 | Nintendo | Sports |
4 | Nintendo | Role-Playing |
... | ... | ... |
16593 | Kemco | Platform |
16594 | Infogrames | Shooter |
16595 | Activision | Racing |
16596 | 7G//AMES | Puzzle |
16597 | Wanadoo | Platform |
16598 rows × 2 columns
# True labels
= [2.5, 3.7, 5.1, 4.2, 6.8]
y_true # Predicted labels
= [2.3, 3.5, 4.9, 4.0, 6.5] y_pred
Mean Absolute Error (MAE):
- MAE measures the average absolute errors between predicted values and actual values.
- Imagine you’re trying to hit a target with darts. The MAE is like calculating the average distance between where your darts hit and the bullseye. You just sum up how far each dart landed from the center (without caring if it was too short or too far) and then find the average. The smaller the MAE, the closer your predictions are to the actual values.
- Formula: $ = {i=1}^{n} |y{} - y_{}| $
from sklearn.metrics import mean_absolute_error
# Calculate Mean Absolute Error (MAE)
= mean_absolute_error(y_true, y_pred)
mae print("Mean Absolute Error (MAE):", mae)
Mean Absolute Error (MAE): 0.21999999999999992
Mean Squared Error (MSE):
- MSE measures the average of the squares of the errors between predicted values and actual values.
- This is similar to MAE, but instead of just adding up the distances, you square them before averaging. Squaring makes bigger differences more noticeable (by making them even bigger), so MSE penalizes larger errors more than smaller ones.
- Formula: $ = {i=1}^{n} (y{} - y_{})^2 $
from sklearn.metrics import mean_squared_error
# Calculate Mean Squared Error (MSE)
= mean_squared_error(y_true, y_pred)
mse print("Mean Squared Error (MSE):", mse)
Mean Squared Error (MSE): 0.04999999999999997
Root Mean Squared Error (RMSE):
- RMSE is the square root of the MSE, providing a more interpretable scale since it’s in the same units as the target variable.
- It’s just like MSE, but we take the square root of the result. This brings the error back to the same scale as the original target variable, which makes it easier to interpret. RMSE gives you an idea of how spread out your errors are in the same units as your data.
- Formula: $ = $
from sklearn.metrics import root_mean_squared_error
# Calculate Root Mean Squared Error (RMSE)
= root_mean_squared_error(y_true, y_pred,)
rmse print("Root Mean Squared Error (RMSE):", rmse)
Root Mean Squared Error (RMSE): 0.2236067977499789
- R-squared (Coefficient of Determination):
- R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables.
- This tells you how well your model’s predictions match the actual data compared to a simple average. If R-squared is 1, it means your model perfectly predicts the target variable. If it’s 0, it means your model is no better than just predicting the mean of the target variable. So, the closer R-squared is to 1, the better your model fits the data.
- Formula: $ R^2 = 1 - $, where $ {y}_{} $ is the mean of the observed data.
from sklearn.metrics import r2_score
# Calculate R-squared (Coefficient of Determination)
= r2_score(y_true, y_pred)
r2 print("R-squared (R2 Score):", r2)
R-squared (R2 Score): 0.975896644812958
Understanding these metrics can help you assess the performance of your regression model and make necessary adjustments to improve its accuracy.