{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h3>MANIPULAR DATOS DE TIEMPO ⌛⌛⌛</h3> \n",
    "\n",
    "En este caso, utilizaremos el conjunto de datos de citas médicas alojado en Kaggle (https://www.kaggle.com/joniarroba/noshowappointments). Este conjunto de datos consta de más de 110.000 citas médicas. \n",
    "\n",
    "La columna principal que utilizaremos para este Notebook son el <em>ScheduledDay</em> (fecha y hora en el que se programó la cita). El objetivo es ver cómo podemos manipular una columna que refleja el tiempo para adaptarlo a las necesidades del futuro modelo de IA. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PatientId</th>\n",
       "      <th>AppointmentID</th>\n",
       "      <th>Gender</th>\n",
       "      <th>ScheduledDay</th>\n",
       "      <th>AppointmentDay</th>\n",
       "      <th>Age</th>\n",
       "      <th>Neighbourhood</th>\n",
       "      <th>Scholarship</th>\n",
       "      <th>Hipertension</th>\n",
       "      <th>Diabetes</th>\n",
       "      <th>Alcoholism</th>\n",
       "      <th>Handcap</th>\n",
       "      <th>SMS_received</th>\n",
       "      <th>No-show</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.987250e+13</td>\n",
       "      <td>5642903</td>\n",
       "      <td>F</td>\n",
       "      <td>2016-04-29T18:38:08Z</td>\n",
       "      <td>2016-04-29T00:00:00Z</td>\n",
       "      <td>62</td>\n",
       "      <td>JARDIM DA PENHA</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5.589978e+14</td>\n",
       "      <td>5642503</td>\n",
       "      <td>M</td>\n",
       "      <td>2016-04-29T16:08:27Z</td>\n",
       "      <td>2016-04-29T00:00:00Z</td>\n",
       "      <td>56</td>\n",
       "      <td>JARDIM DA PENHA</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>4.262962e+12</td>\n",
       "      <td>5642549</td>\n",
       "      <td>F</td>\n",
       "      <td>2016-04-29T16:19:04Z</td>\n",
       "      <td>2016-04-29T00:00:00Z</td>\n",
       "      <td>62</td>\n",
       "      <td>MATA DA PRAIA</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8.679512e+11</td>\n",
       "      <td>5642828</td>\n",
       "      <td>F</td>\n",
       "      <td>2016-04-29T17:29:31Z</td>\n",
       "      <td>2016-04-29T00:00:00Z</td>\n",
       "      <td>8</td>\n",
       "      <td>PONTAL DE CAMBURI</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>8.841186e+12</td>\n",
       "      <td>5642494</td>\n",
       "      <td>F</td>\n",
       "      <td>2016-04-29T16:07:23Z</td>\n",
       "      <td>2016-04-29T00:00:00Z</td>\n",
       "      <td>56</td>\n",
       "      <td>JARDIM DA PENHA</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      PatientId  AppointmentID Gender          ScheduledDay  \\\n",
       "0  2.987250e+13        5642903      F  2016-04-29T18:38:08Z   \n",
       "1  5.589978e+14        5642503      M  2016-04-29T16:08:27Z   \n",
       "2  4.262962e+12        5642549      F  2016-04-29T16:19:04Z   \n",
       "3  8.679512e+11        5642828      F  2016-04-29T17:29:31Z   \n",
       "4  8.841186e+12        5642494      F  2016-04-29T16:07:23Z   \n",
       "\n",
       "         AppointmentDay  Age      Neighbourhood  Scholarship  Hipertension  \\\n",
       "0  2016-04-29T00:00:00Z   62    JARDIM DA PENHA            0             1   \n",
       "1  2016-04-29T00:00:00Z   56    JARDIM DA PENHA            0             0   \n",
       "2  2016-04-29T00:00:00Z   62      MATA DA PRAIA            0             0   \n",
       "3  2016-04-29T00:00:00Z    8  PONTAL DE CAMBURI            0             0   \n",
       "4  2016-04-29T00:00:00Z   56    JARDIM DA PENHA            0             1   \n",
       "\n",
       "   Diabetes  Alcoholism  Handcap  SMS_received No-show  \n",
       "0         0           0        0             0      No  \n",
       "1         0           0        0             0      No  \n",
       "2         0           0        0             0      No  \n",
       "3         0           0        0             0      No  \n",
       "4         1           0        0             0      No  "
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Supongo que has guardado el dataset en la carpeta dataset ;), de lo contrario adaptar el path\n",
    "df = pd.read_csv(\"dataset/cita_medica.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2016-04-29T18:38:08Z\n",
       "1    2016-04-29T16:08:27Z\n",
       "2    2016-04-29T16:19:04Z\n",
       "3    2016-04-29T17:29:31Z\n",
       "4    2016-04-29T16:07:23Z\n",
       "Name: ScheduledDay, dtype: object"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.ScheduledDay.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Como se puede observar, el dtype de <em>ScheduledDay</em> es de tipo object, lo que significa que pandas entiende estos valores como strings. Para convertir estas cadenas en fechas podemos utilizar la función de pandas <em>to_datetime</em>. \n",
    "\n",
    "Utilizar el parámetro format para indicar específicamente el formato puede ser una buena decisión. Si usas el parámetro de formato, tienes que especificar qué hacer con los errores.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'], \n",
    " format = '%Y-%m-%dT%H:%M:%SZ', \n",
    " errors = 'coerce')\n",
    "assert df.ScheduledDay.isnull().sum() == 0, 'missing ScheduledDay dates'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   2016-04-29 18:38:08\n",
       "1   2016-04-29 16:08:27\n",
       "2   2016-04-29 16:19:04\n",
       "3   2016-04-29 17:29:31\n",
       "4   2016-04-29 16:07:23\n",
       "Name: ScheduledDay, dtype: datetime64[ns]"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Ver los cambios. Tipo datetime y un formato más legible\n",
    "df.ScheduledDay.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Al convertir las cadenas en datetimes podemos empezar a utilizar otras propiedades de Pandas: https://pandas.pydata.org/pandas-docs/version/0.23/api.html#datetimelike-properties\n",
    "\n",
    "Básicamente, con Pandas podrás desglosar la fecha y obtener el año, el mes, la semana del año, el día del mes, la hora, los minutos, los segundos, etcétera. También puede obtener el día de la semana (lunes = 0, domingo = 6)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['Año_programado'] = df['ScheduledDay'].dt.year\n",
    "df['Mes_programado'] = df['ScheduledDay'].dt.month\n",
    "df['Semana_programada'] = df['ScheduledDay'].dt.isocalendar().week\n",
    "df['Día_programado'] = df['ScheduledDay'].dt.day\n",
    "df['Hora_programado'] = df['ScheduledDay'].dt.hour\n",
    "df['Minuto_programado'] = df['ScheduledDay'].dt.minute\n",
    "df['Día_semana_programado'] = df['ScheduledDay'].dt.dayofweek"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Año_programado</th>\n",
       "      <th>Mes_programado</th>\n",
       "      <th>Semana_programada</th>\n",
       "      <th>Día_programado</th>\n",
       "      <th>Hora_programado</th>\n",
       "      <th>Minuto_programado</th>\n",
       "      <th>Día_semana_programado</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2016</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "      <td>29</td>\n",
       "      <td>18</td>\n",
       "      <td>38</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2016</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "      <td>29</td>\n",
       "      <td>16</td>\n",
       "      <td>8</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2016</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "      <td>29</td>\n",
       "      <td>16</td>\n",
       "      <td>19</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2016</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "      <td>29</td>\n",
       "      <td>17</td>\n",
       "      <td>29</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2016</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "      <td>29</td>\n",
       "      <td>16</td>\n",
       "      <td>7</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Año_programado  Mes_programado  Semana_programada  Día_programado  \\\n",
       "0            2016               4                 17              29   \n",
       "1            2016               4                 17              29   \n",
       "2            2016               4                 17              29   \n",
       "3            2016               4                 17              29   \n",
       "4            2016               4                 17              29   \n",
       "\n",
       "   Hora_programado  Minuto_programado  Día_semana_programado  \n",
       "0               18                 38                      4  \n",
       "1               16                  8                      4  \n",
       "2               16                 19                      4  \n",
       "3               17                 29                      4  \n",
       "4               16                  7                      4  "
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Año_programado','Mes_programado','Semana_programada','Día_programado','Hora_programado',\n",
    "    'Minuto_programado','Día_semana_programado']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Enhorabuena. Has conseguido organizar la información en columnas separadas, después te quedarás con las que te aporten valor.\n",
    "Si has llegado hasta aquí estás en un punto muy interesante de tu aprendizaje. Sigue empeñándote de este modo y conseguirás todo lo que te propongas.\n",
    "\n",
    "¡Fuerza! 💪💪💪"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}