{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "PR-bTTTyo4lS" }, "source": [ "# Técnica de reducción de la dimensionalidad" ] }, { "cell_type": "markdown", "metadata": { "id": "wj7Nu_bMo_6c" }, "source": [ "La reducción de la dimensionalidad es una técnica que se utiliza en la minería de datos para poder transformar datasets de alta dimensionalidad a unas que tengan una menor dimensionalidad. De esta forma se consiguen unas visualizaciones mucho más simples, y además, facilita mucho la búsqueda de patrones complejos, que a simple vista serían imposibles de detectar en los datos originales.\n", "\n", "También pasa que al tener un montón de atributos, se pueden dar un montón de combinaciones diferentes por lo que para el modelo es mucho más complicado aprender y esto conlleva que el modelo sobreajuste demasiado. Justamente la principal función que cumplen las técnicas de reducción de la dimensionalidad es prevenir el sobreajuste." ] }, { "cell_type": "markdown", "metadata": { "id": "PnqBzS29pRhE" }, "source": [ "## Lectura de datos" ] }, { "cell_type": "markdown", "metadata": { "id": "YogvuiQhpjxd" }, "source": [ "Primero importamos las librerías que necesitaremos durante el ejemplo." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ToX0YHDpj7fj" }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from sklearn.datasets import load_wine" ] }, { "cell_type": "markdown", "metadata": { "id": "eHMIUffypV6B" }, "source": [ "En esta ocasión utilizaremos el dataset de wine, en el que tenemos diferentes atributos de vinos junto con el tipo de vino que pertenece cada uno." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1654863893191, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "sLK316HfkH-8", "outputId": "877c8c7f-2d0b-4efd-9b9e-a7302b359043" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
alcoholmalic_acidashalcalinity_of_ashmagnesiumtotal_phenolsflavanoidsnonflavanoid_phenolsproanthocyaninscolor_intensityhueod280/od315_of_diluted_winesprolinetarget
014.231.712.4315.6127.02.803.060.282.295.641.043.921065.00.0
113.201.782.1411.2100.02.652.760.261.284.381.053.401050.00.0
213.162.362.6718.6101.02.803.240.302.815.681.033.171185.00.0
314.371.952.5016.8113.03.853.490.242.187.800.863.451480.00.0
413.242.592.8721.0118.02.802.690.391.824.321.042.93735.00.0
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols \\\n", "0 14.23 1.71 2.43 15.6 127.0 2.80 \n", "1 13.20 1.78 2.14 11.2 100.0 2.65 \n", "2 13.16 2.36 2.67 18.6 101.0 2.80 \n", "3 14.37 1.95 2.50 16.8 113.0 3.85 \n", "4 13.24 2.59 2.87 21.0 118.0 2.80 \n", "\n", " flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue \\\n", "0 3.06 0.28 2.29 5.64 1.04 \n", "1 2.76 0.26 1.28 4.38 1.05 \n", "2 3.24 0.30 2.81 5.68 1.03 \n", "3 3.49 0.24 2.18 7.80 0.86 \n", "4 2.69 0.39 1.82 4.32 1.04 \n", "\n", " od280/od315_of_diluted_wines proline target \n", "0 3.92 1065.0 0.0 \n", "1 3.40 1050.0 0.0 \n", "2 3.17 1185.0 0.0 \n", "3 3.45 1480.0 0.0 \n", "4 2.93 735.0 0.0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wines = load_wine()\n", "data = pd.DataFrame(data = np.c_[wines['data'], wines['target']],\n", " columns = wines['feature_names'] + ['target'])\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "mnZZGMMWprQn" }, "source": [ "Ahora dividiremos los atributos de la variable objetivo." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "l1sp2_fIkfXU" }, "outputs": [], "source": [ "X = data.iloc[:, 0:13]\n", "y = data.iloc[:, 13]" ] }, { "cell_type": "markdown", "metadata": { "id": "RWdKd6SFpyVR" }, "source": [ "Y también dividiremos las observaciones de los atributos en train y test." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "rs4dFP7ykh7v" }, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=42)" ] }, { "cell_type": "markdown", "metadata": { "id": "hrqsGjP1wEF0" }, "source": [ "En este notebook se mostrará cómo reducir la dimensionalidad del dataset con las técnicas de PCA y LDA, y para ello es muy recomendable escalar los datos." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jReMgswak2YR" }, "outputs": [], "source": [ "from sklearn.preprocessing import StandardScaler\n", "\n", "scaler = StandardScaler().fit(X)\n", "X_sc = scaler.transform(X)\n", "X_train_sc = scaler.transform(X_train)\n", "X_test_sc = scaler.transform(X_test)" ] }, { "cell_type": "markdown", "metadata": { "id": "kiyGLfGqaQbq" }, "source": [ "Aún así, primero vamos a crear un modelo de regresión logística con los datos únicamente escalados, para después poder comparar los resultados." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 17, "status": "ok", "timestamp": 1654863893688, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "j7oPasb5ZLzD", "outputId": "3410f5d4-5267-4071-e38a-01859d7fe010" }, "outputs": [ { "data": { "text/plain": [ "LogisticRegression(random_state=42)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.linear_model import LogisticRegression\n", "\n", "lg = LogisticRegression(random_state=42)\n", "lg.fit(X_train_sc, y_train)" ] }, { "cell_type": "markdown", "metadata": { "id": "zjgIZV0sa8-8" }, "source": [ "Predecimos." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 16, "status": "ok", "timestamp": 1654863893688, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "_Tz7K_Iqaycy", "outputId": "33dc91fb-95ad-4bdb-cd6e-7b070d713b81" }, "outputs": [ { "data": { "text/plain": [ "array([0., 0., 2., 0., 1., 0., 1., 2., 1., 2., 0., 2., 0., 1., 0., 1., 1.,\n", " 1., 0., 1., 0., 1., 1., 2., 2., 2., 1., 1., 1., 0., 0., 1., 2., 0.,\n", " 0., 0., 2., 2., 1., 2., 0., 1., 1., 2., 2., 0., 1., 1., 2., 0., 1.,\n", " 0., 0., 2., 2., 1., 0., 0., 1., 0., 2., 1., 0., 2., 0., 0., 0., 2.,\n", " 0., 0., 0., 2., 1., 0., 2., 1., 0., 2., 1., 1., 0., 2., 0., 0., 1.,\n", " 0., 0., 2., 1., 1., 1., 0., 1., 1., 1., 2., 2., 0., 1., 2., 2., 2.,\n", " 1., 0., 1., 2., 2., 1., 2., 1., 1., 1., 0., 0., 2., 0., 2., 0., 0.,\n", " 1., 1., 0., 0., 0., 1., 0., 2., 2., 1., 1., 1., 2., 2., 1., 0., 0.,\n", " 1., 2., 2., 0., 1., 2., 2.])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred = lg.predict(X_test_sc)\n", "y_pred" ] }, { "cell_type": "markdown", "metadata": { "id": "lmzYsDpsd3JU" }, "source": [ "En la matriz de confusión podemos ver que el modelo que hemos creado tiene una precisión muy alta." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 14, "status": "ok", "timestamp": 1654863893688, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "36qH5JjBamA0", "outputId": "efb5b9c7-bb6d-4409-8721-820bd815b9bc" }, "outputs": [ { "data": { "text/plain": [ "array([[48, 0, 0],\n", " [ 3, 50, 4],\n", " [ 0, 0, 38]])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.metrics import confusion_matrix\n", "\n", "confusion_matrix(y_test, y_pred)" ] }, { "cell_type": "markdown", "metadata": { "id": "YQgN9cnop6sd" }, "source": [ "## Análisis de componentes principales (PCA)" ] }, { "cell_type": "markdown", "metadata": { "id": "pE5qw3Kpqj7X" }, "source": [ "Es una técnica que permite reducir la dimensionalidad del dataset minimizando la pérdida de información en el proceso. Pertenece a la familia de las técnicas no supervisadas de machine learning, y estas técnicas no necesitan variables objetivo. El principal problema de este tipo de técnicas suele ser que son difíciles de validar porque no tienen la variable objetivo para poder contrastar los resultados." ] }, { "cell_type": "markdown", "metadata": { "id": "K5UmFZT5q3az" }, "source": [ "Ahora crearemos una instancia de la clase PCA que después utilizaremos para transformar los datos originales." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JxJzHV8Ilk0J" }, "outputs": [], "source": [ "from sklearn.decomposition import PCA\n", "\n", "pca = PCA()" ] }, { "cell_type": "markdown", "metadata": { "id": "m-xz-UHUrUBL" }, "source": [ "Transformamos los datos." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zMH0kmlqlq0o" }, "outputs": [], "source": [ "pca_fitted = pca.fit(X_train_sc)\n", "X_train_pca = pca_fitted.transform(X_train_sc)\n", "X_test_pca = pca_fitted.transform(X_test_sc)" ] }, { "cell_type": "markdown", "metadata": { "id": "gRSbx43orfel" }, "source": [ "Obtenemos los valores de los ratios de la varianza explicada para ver cuantas nuevas variables necesitaremos para poder explicar la información de los datos originales." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 13, "status": "ok", "timestamp": 1654863893689, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "9RFsNVg9l7oS", "outputId": "be9e4d1d-71c4-44dc-87ff-a4ac8dce499c" }, "outputs": [ { "data": { "text/plain": [ "array([0.39794788, 0.22572919, 0.12674483, 0.06712853, 0.03788473,\n", " 0.03522223, 0.0262903 , 0.02372236, 0.01990599, 0.01577964,\n", " 0.01206248, 0.0081107 , 0.00347113])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp_var_pca = pca.explained_variance_ratio_\n", "exp_var_pca" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "I4_GXKvGm4ce" }, "outputs": [], "source": [ "cum_sum_eigenvalues = np.cumsum(exp_var_pca)" ] }, { "cell_type": "markdown", "metadata": { "id": "3CFAT13-r680" }, "source": [ "Podemos ver que con 8 componentes pasamos los 0.92, por lo que probaremos con las 8 primeras componentes principales y analizaremos los resultados. Aun así, la cantidad de varianza explicada que hay que superar es un parámetro que cada usuario determinará a su gusto. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 417 }, "executionInfo": { "elapsed": 1006, "status": "ok", "timestamp": 1654863894685, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "YKfKJ9BJnVG-", "outputId": "bfe1e2bb-e945-4178-9dd0-5996ee7d33f0" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly.express as px\n", "\n", "px.area(\n", " x=range(1, cum_sum_eigenvalues.shape[0] + 1),\n", " y=cum_sum_eigenvalues,\n", " labels={\"x\": \"# Components\", \"y\": \"Explained Variance\"},\n", " width=800, height=400\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "9z7qvFT8ui7x" }, "source": [ "Como hemos comentado vamos a coger las primeras 8 componentes principales y vamos a ver como sería crear un modelo simple de regresión logística." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6VFVpmVeuiMu" }, "outputs": [], "source": [ "pca = PCA(n_components = 8)\n", "\n", "pca_fitted = pca.fit(X_train_sc)\n", "\n", "X_train_pca = pca_fitted.transform(X_train_sc)\n", "X_test_pca = pca_fitted.transform(X_test_sc)" ] }, { "cell_type": "markdown", "metadata": { "id": "h0uQtyx1uRfY" }, "source": [ "Ahora veremos como sería crear un modelos de regresión logística utilizando el dataset que hemos obtenido con el PCA." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 14, "status": "ok", "timestamp": 1654863894686, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "etihBkIMsd7U", "outputId": "1059fa68-ca20-4dfb-bc9f-131d3e3f5584" }, "outputs": [ { "data": { "text/plain": [ "LogisticRegression(random_state=42)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lg = LogisticRegression(random_state=42)\n", "lg.fit(X_train_pca, y_train)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 14, "status": "ok", "timestamp": 1654863894687, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "Dmzv33los1mq", "outputId": "853c99af-8e52-4203-899b-3566e11f49a7" }, "outputs": [ { "data": { "text/plain": [ "array([0., 0., 2., 0., 1., 0., 1., 2., 1., 2., 0., 2., 0., 1., 0., 1., 1.,\n", " 1., 0., 1., 0., 1., 1., 2., 2., 2., 1., 1., 1., 0., 0., 1., 2., 0.,\n", " 0., 0., 2., 2., 1., 2., 0., 1., 1., 0., 2., 0., 1., 1., 2., 0., 1.,\n", " 0., 0., 2., 2., 1., 0., 0., 1., 0., 2., 1., 0., 2., 0., 0., 0., 2.,\n", " 0., 0., 0., 2., 1., 0., 2., 1., 0., 2., 1., 1., 0., 2., 0., 0., 1.,\n", " 0., 0., 2., 1., 1., 1., 0., 1., 1., 1., 2., 2., 0., 1., 2., 2., 2.,\n", " 1., 0., 1., 2., 2., 1., 2., 1., 1., 1., 0., 0., 2., 0., 2., 0., 0.,\n", " 1., 1., 0., 0., 0., 1., 0., 1., 2., 1., 1., 1., 2., 2., 1., 0., 0.,\n", " 1., 2., 2., 0., 1., 2., 2.])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred = lg.predict(X_test_pca)\n", "y_pred" ] }, { "cell_type": "markdown", "metadata": { "id": "8_JanugXucp9" }, "source": [ "Podemos apreciar en la matriz de confusión que el modelo que hemos creado con los datos transformados tiene la misma precisión que el modelo anterior, pero hay que tener en cuenta que esta vez hemos uilizado 5 variables menos, por lo que, en esta ocasión, se puede decir que hemos acertado utilizando el PCA." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 12, "status": "ok", "timestamp": 1654863894687, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "zkjWzipYtjm7", "outputId": "f1d49fc7-ce1a-4499-aa6d-def344416e35" }, "outputs": [ { "data": { "text/plain": [ "array([[48, 0, 0],\n", " [ 4, 51, 2],\n", " [ 0, 0, 38]])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "confusion_matrix(y_test, y_pred)" ] }, { "cell_type": "markdown", "metadata": { "id": "gczWS5-qwjVz" }, "source": [ "## Análisis discriminante lineal" ] }, { "cell_type": "markdown", "metadata": { "id": "mRIHJesMQaC6" }, "source": [ "Es un algoritmo predictivo para clasificaciones multiclase, pero también se puede utilizar como técnica para la reducción de la dimensionalidad. En este caso nos encontramos ante un algoritmo supervisado, por lo que necesitaremos la variable objetivo.\n", "\n", "Cuando el objetivo del PCA es encontrar los componentes con mayor varianza posible, el de LDA es maximizar la separación de clases entre los componentes.\n", "\n", "Crearemos una instancia de la clase LDA para hacer el ejemplo de reducción de dimensionalidad." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "w41-EhNQtr-h" }, "outputs": [], "source": [ "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n", "\n", "lda = LinearDiscriminantAnalysis()" ] }, { "cell_type": "markdown", "metadata": { "id": "2HGsvLJuQuIE" }, "source": [ "Transformamos los datos escalados previamente." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "z2PqmZ0Kw3tq" }, "outputs": [], "source": [ "lda_fitted = lda.fit(X_train_sc, y_train)\n", "\n", "X_train_lda = lda_fitted.transform(X_train_sc)\n", "X_test_lda = lda_fitted.transform(X_test_sc)" ] }, { "cell_type": "markdown", "metadata": { "id": "HABFaPYvQ31O" }, "source": [ "Obtenemos los valores de los ratios de la varianza explicada para ver cuantas nuevas variables necesitaremos para poder explicar la información de los datos originales." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 10, "status": "ok", "timestamp": 1654863894688, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "v62MQbf5yvLB", "outputId": "d59aa37e-79d6-4553-ffab-ddd711eadbfe" }, "outputs": [ { "data": { "text/plain": [ "array([0.70699454, 0.29300546])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "exp_var_pca = lda.explained_variance_ratio_\n", "exp_var_pca" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "q9Px7Ubvyv8v" }, "outputs": [], "source": [ "cum_sum_eigenvalues = np.cumsum(exp_var_pca)" ] }, { "cell_type": "markdown", "metadata": { "id": "l_y91zeIQ8Kc" }, "source": [ "Podemos ver que, como esperábamos, solo ha creado 2 componentes, ya que la variable objetivo tiene 3 clases diferentes. \n", " \n", "También vemos que con solo un componente no llega a los 0.71, pero si cogemos los 2 componentes llega al 1, por lo que escogeremos los dos componentes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 417 }, "executionInfo": { "elapsed": 446, "status": "ok", "timestamp": 1654863895127, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "peCCp5kYyvtK", "outputId": "048947bf-dddd-4eef-f49e-7fe0694aca13" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", "\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly.express as px\n", "\n", "px.area(\n", " x=range(1, cum_sum_eigenvalues.shape[0] + 1),\n", " y=cum_sum_eigenvalues,\n", " labels={\"x\": \"# Components\", \"y\": \"Explained Variance\"},\n", " width=800, height=400\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "JVKWA4XRRgpq" }, "source": [ "Con la transformación del dataset que hemos obtenido crearemos un modelo muy simple de regresión logística para ver su rendimiento." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9AQZV0Gdw605" }, "outputs": [], "source": [ "lg = LogisticRegression(random_state=42)" ] }, { "cell_type": "markdown", "metadata": { "id": "3Zfx9-3aRw7l" }, "source": [ "Entrenamos el modelo." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 12, "status": "ok", "timestamp": 1654863895128, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "FXHigVarxK8n", "outputId": "16282495-7512-4db5-bff8-aab62454090a" }, "outputs": [ { "data": { "text/plain": [ "LogisticRegression(random_state=42)" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lg.fit(X_train_lda, y_train)" ] }, { "cell_type": "markdown", "metadata": { "id": "yPZari-RRy_u" }, "source": [ "Hacemos las predicciones." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 11, "status": "ok", "timestamp": 1654863895129, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "82w393BQxNyg", "outputId": "ed2646e5-a455-4f41-8ce3-023be62d9fe9" }, "outputs": [ { "data": { "text/plain": [ "array([0., 0., 2., 0., 1., 0., 1., 2., 1., 2., 0., 2., 0., 1., 0., 1., 1.,\n", " 1., 0., 1., 0., 1., 2., 2., 2., 2., 1., 1., 1., 0., 0., 1., 2., 0.,\n", " 0., 0., 2., 2., 1., 2., 0., 1., 1., 2., 2., 0., 1., 1., 2., 0., 1.,\n", " 0., 0., 2., 2., 1., 2., 0., 1., 0., 2., 1., 2., 2., 0., 0., 0., 2.,\n", " 0., 0., 2., 2., 1., 0., 2., 1., 0., 2., 1., 2., 0., 2., 0., 0., 1.,\n", " 0., 0., 2., 1., 1., 1., 0., 1., 1., 1., 2., 2., 0., 1., 2., 2., 2.,\n", " 1., 0., 1., 2., 2., 1., 2., 2., 1., 1., 0., 0., 2., 0., 2., 0., 0.,\n", " 1., 2., 0., 0., 0., 1., 0., 2., 2., 1., 1., 1., 2., 2., 1., 0., 0.,\n", " 1., 2., 2., 0., 1., 2., 2.])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_pred = lg.predict(X_test_lda)\n", "y_pred" ] }, { "cell_type": "markdown", "metadata": { "id": "vrAMrDS7R1CZ" }, "source": [ "En la matriz de confusión podemos ver que los resultados que hemos obtenido diferentes, ya que el error cometido solo afecta a las clases 2 y 3 (la clase 1 se predice a la perfección). Además, en este caso solo tenemos dos componentes por lo que podemos aprovechar las ventajas que esto conlleva." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 10, "status": "ok", "timestamp": 1654863895130, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "9pvHGcqexTgO", "outputId": "acd836b5-110f-4070-c5cf-c846e52719cb" }, "outputs": [ { "data": { "text/plain": [ "array([[48, 0, 0],\n", " [ 0, 46, 11],\n", " [ 0, 0, 38]])" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "confusion_matrix(y_test, y_pred)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "37Kyhx2EgZGk" }, "outputs": [], "source": [] } ], "metadata": { "colab": { "authorship_tag": "ABX9TyM1JtyFVw8grgQTE1JRPSAp", "collapsed_sections": [], "name": "tecnicas_reduccion_dimensionalidad.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 1 }