{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
MANIPULAR DATOS DE TIEMPO ⌛⌛⌛
\n",
"\n",
"En este caso, utilizaremos el conjunto de datos de citas médicas alojado en Kaggle (https://www.kaggle.com/joniarroba/noshowappointments). Este conjunto de datos consta de más de 110.000 citas médicas. \n",
"\n",
"La columna principal que utilizaremos para este Notebook son el ScheduledDay (fecha y hora en el que se programó la cita). El objetivo es ver cómo podemos manipular una columna que refleja el tiempo para adaptarlo a las necesidades del futuro modelo de IA. "
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PatientId | \n",
" AppointmentID | \n",
" Gender | \n",
" ScheduledDay | \n",
" AppointmentDay | \n",
" Age | \n",
" Neighbourhood | \n",
" Scholarship | \n",
" Hipertension | \n",
" Diabetes | \n",
" Alcoholism | \n",
" Handcap | \n",
" SMS_received | \n",
" No-show | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2.987250e+13 | \n",
" 5642903 | \n",
" F | \n",
" 2016-04-29T18:38:08Z | \n",
" 2016-04-29T00:00:00Z | \n",
" 62 | \n",
" JARDIM DA PENHA | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" No | \n",
"
\n",
" \n",
" 1 | \n",
" 5.589978e+14 | \n",
" 5642503 | \n",
" M | \n",
" 2016-04-29T16:08:27Z | \n",
" 2016-04-29T00:00:00Z | \n",
" 56 | \n",
" JARDIM DA PENHA | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" No | \n",
"
\n",
" \n",
" 2 | \n",
" 4.262962e+12 | \n",
" 5642549 | \n",
" F | \n",
" 2016-04-29T16:19:04Z | \n",
" 2016-04-29T00:00:00Z | \n",
" 62 | \n",
" MATA DA PRAIA | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" No | \n",
"
\n",
" \n",
" 3 | \n",
" 8.679512e+11 | \n",
" 5642828 | \n",
" F | \n",
" 2016-04-29T17:29:31Z | \n",
" 2016-04-29T00:00:00Z | \n",
" 8 | \n",
" PONTAL DE CAMBURI | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" No | \n",
"
\n",
" \n",
" 4 | \n",
" 8.841186e+12 | \n",
" 5642494 | \n",
" F | \n",
" 2016-04-29T16:07:23Z | \n",
" 2016-04-29T00:00:00Z | \n",
" 56 | \n",
" JARDIM DA PENHA | \n",
" 0 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" No | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PatientId AppointmentID Gender ScheduledDay \\\n",
"0 2.987250e+13 5642903 F 2016-04-29T18:38:08Z \n",
"1 5.589978e+14 5642503 M 2016-04-29T16:08:27Z \n",
"2 4.262962e+12 5642549 F 2016-04-29T16:19:04Z \n",
"3 8.679512e+11 5642828 F 2016-04-29T17:29:31Z \n",
"4 8.841186e+12 5642494 F 2016-04-29T16:07:23Z \n",
"\n",
" AppointmentDay Age Neighbourhood Scholarship Hipertension \\\n",
"0 2016-04-29T00:00:00Z 62 JARDIM DA PENHA 0 1 \n",
"1 2016-04-29T00:00:00Z 56 JARDIM DA PENHA 0 0 \n",
"2 2016-04-29T00:00:00Z 62 MATA DA PRAIA 0 0 \n",
"3 2016-04-29T00:00:00Z 8 PONTAL DE CAMBURI 0 0 \n",
"4 2016-04-29T00:00:00Z 56 JARDIM DA PENHA 0 1 \n",
"\n",
" Diabetes Alcoholism Handcap SMS_received No-show \n",
"0 0 0 0 0 No \n",
"1 0 0 0 0 No \n",
"2 0 0 0 0 No \n",
"3 0 0 0 0 No \n",
"4 1 0 0 0 No "
]
},
"execution_count": 71,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# Supongo que has guardado el dataset en la carpeta dataset ;), de lo contrario adaptar el path\n",
"df = pd.read_csv(\"dataset/cita_medica.csv\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 2016-04-29T18:38:08Z\n",
"1 2016-04-29T16:08:27Z\n",
"2 2016-04-29T16:19:04Z\n",
"3 2016-04-29T17:29:31Z\n",
"4 2016-04-29T16:07:23Z\n",
"Name: ScheduledDay, dtype: object"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.ScheduledDay.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Como se puede observar, el dtype de ScheduledDay es de tipo object, lo que significa que pandas entiende estos valores como strings. Para convertir estas cadenas en fechas podemos utilizar la función de pandas to_datetime. \n",
"\n",
"Utilizar el parámetro format para indicar específicamente el formato puede ser una buena decisión. Si usas el parámetro de formato, tienes que especificar qué hacer con los errores.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'], \n",
" format = '%Y-%m-%dT%H:%M:%SZ', \n",
" errors = 'coerce')\n",
"assert df.ScheduledDay.isnull().sum() == 0, 'missing ScheduledDay dates'"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 2016-04-29 18:38:08\n",
"1 2016-04-29 16:08:27\n",
"2 2016-04-29 16:19:04\n",
"3 2016-04-29 17:29:31\n",
"4 2016-04-29 16:07:23\n",
"Name: ScheduledDay, dtype: datetime64[ns]"
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Ver los cambios. Tipo datetime y un formato más legible\n",
"df.ScheduledDay.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Al convertir las cadenas en datetimes podemos empezar a utilizar otras propiedades de Pandas: https://pandas.pydata.org/pandas-docs/version/0.23/api.html#datetimelike-properties\n",
"\n",
"Básicamente, con Pandas podrás desglosar la fecha y obtener el año, el mes, la semana del año, el día del mes, la hora, los minutos, los segundos, etcétera. También puede obtener el día de la semana (lunes = 0, domingo = 6)."
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [],
"source": [
"df['Año_programado'] = df['ScheduledDay'].dt.year\n",
"df['Mes_programado'] = df['ScheduledDay'].dt.month\n",
"df['Semana_programada'] = df['ScheduledDay'].dt.isocalendar().week\n",
"df['Día_programado'] = df['ScheduledDay'].dt.day\n",
"df['Hora_programado'] = df['ScheduledDay'].dt.hour\n",
"df['Minuto_programado'] = df['ScheduledDay'].dt.minute\n",
"df['Día_semana_programado'] = df['ScheduledDay'].dt.dayofweek"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Año_programado | \n",
" Mes_programado | \n",
" Semana_programada | \n",
" Día_programado | \n",
" Hora_programado | \n",
" Minuto_programado | \n",
" Día_semana_programado | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2016 | \n",
" 4 | \n",
" 17 | \n",
" 29 | \n",
" 18 | \n",
" 38 | \n",
" 4 | \n",
"
\n",
" \n",
" 1 | \n",
" 2016 | \n",
" 4 | \n",
" 17 | \n",
" 29 | \n",
" 16 | \n",
" 8 | \n",
" 4 | \n",
"
\n",
" \n",
" 2 | \n",
" 2016 | \n",
" 4 | \n",
" 17 | \n",
" 29 | \n",
" 16 | \n",
" 19 | \n",
" 4 | \n",
"
\n",
" \n",
" 3 | \n",
" 2016 | \n",
" 4 | \n",
" 17 | \n",
" 29 | \n",
" 17 | \n",
" 29 | \n",
" 4 | \n",
"
\n",
" \n",
" 4 | \n",
" 2016 | \n",
" 4 | \n",
" 17 | \n",
" 29 | \n",
" 16 | \n",
" 7 | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Año_programado Mes_programado Semana_programada Día_programado \\\n",
"0 2016 4 17 29 \n",
"1 2016 4 17 29 \n",
"2 2016 4 17 29 \n",
"3 2016 4 17 29 \n",
"4 2016 4 17 29 \n",
"\n",
" Hora_programado Minuto_programado Día_semana_programado \n",
"0 18 38 4 \n",
"1 16 8 4 \n",
"2 16 19 4 \n",
"3 17 29 4 \n",
"4 16 7 4 "
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[['Año_programado','Mes_programado','Semana_programada','Día_programado','Hora_programado',\n",
" 'Minuto_programado','Día_semana_programado']].head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Enhorabuena. Has conseguido organizar la información en columnas separadas, después te quedarás con las que te aporten valor.\n",
"Si has llegado hasta aquí estás en un punto muy interesante de tu aprendizaje. Sigue empeñándote de este modo y conseguirás todo lo que te propongas.\n",
"\n",
"¡Fuerza! 💪💪💪"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 4
}