{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "PR-bTTTyo4lS" }, "source": [ "# Técnica de reducción de la dimensionalidad" ] }, { "cell_type": "markdown", "metadata": { "id": "wj7Nu_bMo_6c" }, "source": [ "La reducción de la dimensionalidad es una técnica que se utiliza en la minería de datos para poder transformar datasets de alta dimensionalidad a unas que tengan una menor dimensionalidad. De esta forma se consiguen unas visualizaciones mucho más simples, y además, facilita mucho la búsqueda de patrones complejos, que a simple vista serían imposibles de detectar en los datos originales.\n", "\n", "También pasa que al tener un montón de atributos, se pueden dar un montón de combinaciones diferentes por lo que para el modelo es mucho más complicado aprender y esto conlleva que el modelo sobreajuste demasiado. Justamente la principal función que cumplen las técnicas de reducción de la dimensionalidad es prevenir el sobreajuste." ] }, { "cell_type": "markdown", "metadata": { "id": "PnqBzS29pRhE" }, "source": [ "## Lectura de datos" ] }, { "cell_type": "markdown", "metadata": { "id": "YogvuiQhpjxd" }, "source": [ "Primero importamos las librerías que necesitaremos durante el ejemplo." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ToX0YHDpj7fj" }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from sklearn.datasets import load_wine" ] }, { "cell_type": "markdown", "metadata": { "id": "eHMIUffypV6B" }, "source": [ "En esta ocasión utilizaremos el dataset de wine, en el que tenemos diferentes atributos de vinos junto con el tipo de vino que pertenece cada uno." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 7, "status": "ok", "timestamp": 1654863893191, "user": { "displayName": "Mikel Armendariz", "userId": "04878841620519662639" }, "user_tz": -120 }, "id": "sLK316HfkH-8", "outputId": "877c8c7f-2d0b-4efd-9b9e-a7302b359043" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " | alcohol | \n", "malic_acid | \n", "ash | \n", "alcalinity_of_ash | \n", "magnesium | \n", "total_phenols | \n", "flavanoids | \n", "nonflavanoid_phenols | \n", "proanthocyanins | \n", "color_intensity | \n", "hue | \n", "od280/od315_of_diluted_wines | \n", "proline | \n", "target | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "14.23 | \n", "1.71 | \n", "2.43 | \n", "15.6 | \n", "127.0 | \n", "2.80 | \n", "3.06 | \n", "0.28 | \n", "2.29 | \n", "5.64 | \n", "1.04 | \n", "3.92 | \n", "1065.0 | \n", "0.0 | \n", "
1 | \n", "13.20 | \n", "1.78 | \n", "2.14 | \n", "11.2 | \n", "100.0 | \n", "2.65 | \n", "2.76 | \n", "0.26 | \n", "1.28 | \n", "4.38 | \n", "1.05 | \n", "3.40 | \n", "1050.0 | \n", "0.0 | \n", "
2 | \n", "13.16 | \n", "2.36 | \n", "2.67 | \n", "18.6 | \n", "101.0 | \n", "2.80 | \n", "3.24 | \n", "0.30 | \n", "2.81 | \n", "5.68 | \n", "1.03 | \n", "3.17 | \n", "1185.0 | \n", "0.0 | \n", "
3 | \n", "14.37 | \n", "1.95 | \n", "2.50 | \n", "16.8 | \n", "113.0 | \n", "3.85 | \n", "3.49 | \n", "0.24 | \n", "2.18 | \n", "7.80 | \n", "0.86 | \n", "3.45 | \n", "1480.0 | \n", "0.0 | \n", "
4 | \n", "13.24 | \n", "2.59 | \n", "2.87 | \n", "21.0 | \n", "118.0 | \n", "2.80 | \n", "2.69 | \n", "0.39 | \n", "1.82 | \n", "4.32 | \n", "1.04 | \n", "2.93 | \n", "735.0 | \n", "0.0 | \n", "