import pandas as pd
SciKit-Learn Classification Metrics
In scikit-learn, classification metrics are essential tools to evaluate the performance of a classification model. They provide insights into how well the model is performing and where it may need improvements. Here are some commonly used classification metrics along with examples of how to implement them using scikit-learn:
# True labels
= [0, 1, 1, 0, 1]
y_true # Predicted labels
= [1, 1, 0, 0, 1] y_pred
Accuracy:
- Accuracy measures the ratio of correctly predicted instances to the total instances.
- Formula:
Accuracy = Number of Correct Predictions / Total Number of Predictions
from sklearn.metrics import accuracy_score
# Calculate accuracy
= accuracy_score(y_true=y_true, y_pred=y_pred)
accuracy print("Accuracy:", accuracy)
Accuracy: 0.6
Precision:
- Precision measures the ability of the classifier not to label as positive a sample that is negative.
- Precision is simply the models ability to not make a mistake
- Formula:
Precision = True Positives / (True Positives + False Positives)
from sklearn.metrics import precision_score
# Calculate precision
= precision_score(y_true, y_pred)
precision print("Precision:", precision * 100)
Precision: 66.66666666666666
Recall (also known as Sensitivity or True Positive Rate):
- Recall measures the ability of the classifier to find all the positive samples.
- Formula:
Recall = True Positives / (True Positives + False Negatives)
from sklearn.metrics import recall_score
# Calculate recall
= recall_score(y_true, y_pred)
recall print("Recall:", recall)
Recall: 0.6666666666666666
F1 Score:
- F1 Score is the harmonic mean of precision and recall.
- It provides a balance between precision and recall.
- Formula:
F1 Score = (2 x Precision x Recall) / (Precision + Recall)
from sklearn.metrics import f1_score
# Calculate F1 Score
= f1_score(y_true, y_pred)
f1 print("F1 Score:", f1)
F1 Score: 0.6666666666666666
In classification tasks, metrics like precision, recall, and F1-score are commonly used to evaluate the performance of a model. When dealing with multi-class classification, you often need a way to aggregate these metrics across all classes. Three common methods for doing this are micro-average, macro-average, and weighted average.
Micro-average:
- Calculate metrics globally by counting the total true positives, false negatives, and false positives.
- This method gives equal weight to each individual prediction, regardless of class imbalance.
# Calculate micro-average
= precision_score(y_true, y_pred, average='micro')
micro_precision = recall_score(y_true, y_pred, average='micro')
micro_recall = f1_score(y_true, y_pred, average='micro')
micro_f1
print('Micro Precision:', micro_precision)
print('Micro Recall:', micro_recall)
print('Micro F1:', micro_f1)
Micro Precision: 0.6
Micro Recall: 0.6
Micro F1: 0.6
Macro-average:
- Calculate metrics for each class individually and then average them.
- This method treats all classes equally, giving each class the same weight.
- To obtain macro-averaged precision, recall, and F1-score:
- Calculate precision, recall, and F1-score for each class.
- Average the precision, recall, and F1-score across all classes.
# Calculate macro-average
= precision_score(y_true, y_pred, average='macro')
macro_precision = recall_score(y_true, y_pred, average='macro')
macro_recall = f1_score(y_true, y_pred, average='macro')
macro_f1
print('Macro Precision:', macro_precision)
print('Macro Recall:', macro_recall)
print('Macro F1:', macro_f1)
Macro Precision: 0.5833333333333333
Macro Recall: 0.5833333333333333
Macro F1: 0.5833333333333333
Weighted average:
- Calculate metrics for each class individually and then average them, weighted by the number of true instances for each class.
- This method considers class imbalance by giving more weight to classes with more instances.
- To obtain weighted-averaged precision, recall, and F1-score:
- Calculate precision, recall, and F1-score for each class.
- Weighted average is calculated as the
sum of (metric * class_weight) / total_number_of_samples
, where class_weight is the ratio of the number of true instances in the given class to the total number of true instances.
# Calculate weighted-average
= precision_score(y_true, y_pred, average='weighted')
weighted_precision = recall_score(y_true, y_pred, average='weighted')
weighted_recall = f1_score(y_true, y_pred, average='weighted')
weighted_f1
print('Weighted Precision:', weighted_precision)
print('Weighted Recall:', weighted_recall)
print('Weighted F1:', weighted_f1)
Weighted Precision: 0.6
Weighted Recall: 0.6
Weighted F1: 0.6
Weighted_precision:
(Precision_A * N_A + Precision_B * N_B, ... , Precision_n * N_n) / (N_A + N_B + ... + N_n)
- Micro-average is useful when overall performance across all classes is important
- Macro-average is helpful when you want to evaluate the model’s performance on smaller classes equally.
- Weighted average is suitable when you want to account for class imbalance.
The Classification Report
The classification report in scikit-learn provides a comprehensive summary of different classification metrics for each class in the dataset. It includes precision, recall, F1-score, and support (the number of true instances for each label). Here’s how you can generate a classification report:
from sklearn.metrics import classification_report
# Generate classification report
= classification_report(y_true, y_pred, output_dict=True)
class_report_dict = pd.DataFrame(class_report_dict).transpose()
class_report_df
print("Classification Report:\n")
class_report_df
Classification Report:
precision | recall | f1-score | support | |
---|---|---|---|---|
0 | 0.500000 | 0.500000 | 0.500000 | 2.0 |
1 | 0.666667 | 0.666667 | 0.666667 | 3.0 |
accuracy | 0.600000 | 0.600000 | 0.600000 | 0.6 |
macro avg | 0.583333 | 0.583333 | 0.583333 | 5.0 |
weighted avg | 0.600000 | 0.600000 | 0.600000 | 5.0 |
0.666667 + 0.5)/2 (
0.5833335
Confusion Matrix:
- A confusion matrix is a table that is often used to describe the performance of a classification model.
- It presents a summary of the model’s predictions on the classification problem, showing correct predictions as well as types of errors made.
from sklearn.metrics import confusion_matrix
import pandas as pd
# Calculate confusion matrix
= confusion_matrix(y_true, y_pred)
conf_matrix
= pd.DataFrame(conf_matrix, index=[0, 1], columns=[0, 1])
conf_matrix # print("Confusion Matrix:\n", conf_matrix)
conf_matrix
0 | 1 | |
---|---|---|
0 | 1 | 1 |
1 | 1 | 2 |
These are just a few of the many classification metrics available in scikit-learn. Depending on your specific problem and requirements, you may want to explore other metrics as well.
Understanding these metrics and how they are computed can provide valuable insights into the performance of a classification model and help in making informed decisions about its improvement.