### Ejemplo Métricas de rendimiento para modelos de clasificación

#### Tomando el dataset incluido en la librería de Scikit-Learn sobre casos de cáncer de pulmón, vamos a calcular las métricas de rendimiento para un clasificador Random Forest.

#### Cargamos las librerías necesarias

In [1]:
# Tratamiento de datos
# ==============================================================================
import pandas as pd
import numpy as np

# Set de datos
# ==============================================================================
from sklearn.datasets import load_breast_cancer

# Preprocesado y modelado
# ==============================================================================
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Métricas de rendimiento
# ==============================================================================
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, roc_curve, roc_auc_score


#### Creamos y visualizamos el set de datos

In [3]:
bc = load_breast_cancer()
df = pd.DataFrame(data=bc.data, columns=bc.feature_names)
df["target"] = bc.target

malignant_subset_df = df.loc[df["target"] == 0].sample(30)

new_df = pd.concat([malignant_subset_df, df.loc[df["target"] == 1]])
new_df = new_df.sample(frac=1).reset_index(drop=True)

new_df.groupby(["target"]).count()[["mean radius"]].rename(columns={"mean radius":"count"})


Unnamed: 0_level_0,count
target,Unnamed: 1_level_1
0,30
1,357


####  De la tabla anteriore se puede ver que tenemos 30 casos como malignos y 357 benignos

#### Dividimos los datos en el set de entrenamiento y de test. Calculamos la precisión del modelo.

In [5]:
y = new_df["target"].values
X = new_df.drop(columns=["target"]).values

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3)


rf_clf = RandomForestClassifier().fit(X_train, y_train)
y_pred_rf = rf_clf.predict(X_test)


print(f"Precisión del modelo RF = {accuracy_score(y_test, y_pred_rf)}")

Precisión del modelo RF = 0.9743589743589743


#### Calculamos la matriz de confusión.

In [6]:
confusion_matrix(y_test,y_pred_rf)

array([[  6,   2],
       [  1, 108]], dtype=int64)

#### Calculamos los valores precision, recall y AUC.

In [15]:
print("rf_clf")
print("Precision:", precision_score(y_test, y_pred_rf))
print("Recall:", recall_score(y_test, y_pred_rf))
print("AUC:", roc_auc_score(y_test,y_pred_rf))

rf_clf
Precision: 0.9818181818181818
Recall: 0.9908256880733946
AUC: 0.8704128440366973
