{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "183fc8ed-3005-4af8-bdf3-0a1c5f435331",
   "metadata": {},
   "source": [
    "# Selection 50 Covariates"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36a3c988-7720-407e-b031-fb77266bfa80",
   "metadata": {},
   "source": [
    "## Import Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c66d6d21-6c91-4635-b7fe-0e26072c1614",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d689744-423b-4de0-9f9b-ee75e115cb78",
   "metadata": {},
   "source": [
    "## Import Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f19d4324-d7f1-4355-9163-ae0bdedbeca3",
   "metadata": {},
   "source": [
    "The original data is taken from the \"linked birth and infant death dataset from 1995[(LBIDD-95)](https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/periodlinkedus/LinkPE95US.zip)\". \n",
    "\n",
    "The documentation of the LBIDD-95 dataset can be found [here](https://ftp.cdc.gov/pub/Health_Statistics/NCHS/Dataset_Documentation/DVS/periodlinked/LinkPE95Guide.pdf).\n",
    "\n",
    "\n",
    "To create the new dataset we only use the \"Denominator file\" which contains the birth certificate data of ~ 3.900.000 infants."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "033f0f9a-c16e-4857-b2b7-b102d521214e",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('LinkPE95US/LinkPE95USden.dat',  header=None,names=['original'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a5fb349-e63e-4b7b-bcc9-6fb8903bb5f9",
   "metadata": {},
   "source": [
    "<hr style=\"border: 2px solid black\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce5d0a39-20f5-4382-91d6-5d5948d45431",
   "metadata": {},
   "source": [
    "To optimize the process of counting and eliminating the rows with missing values, we partitioned the 50 covariates into subsets, each corresponding to variables with specific missing observation indicators.\n",
    "\n",
    "The \"c_x\" vector contains the names of covariates where \"x\" encodes a missing observation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "434a1665-f11b-4fd1-8060-88c6c4fefb11",
   "metadata": {},
   "outputs": [],
   "source": [
    "c_9=[ 'DMAR','PLDEL','BIRATTND']\n",
    "c_99=['MONPRE','GESTAT','DTOTORD']\n",
    "c_9999=['DBIRWT']\n",
    "c_5=['DELMETH5']\n",
    "c_3=['MPLBIRR']\n",
    "c_8_9=['CARDIAC','LUNG','DIABETES','HYDRA','HEMO','CHYPER','PHYYPER', 'ECLAMP','INCERVIX','PRETERM','OTHERMR','AMNIO','MONITOR','INDUCT', 'STIMULA','TOCOL','ULTRAS','OTHEROB','FEBRILE','MECONIUM','RUPTURE','ABRUPTIO','PREPLACE','EXCEBLD', 'SEIZURE','PRECIP','PROLONG','CEPHALO','CORD','OTHERLB', 'NANEMIA', 'HYALINE','MECONSYN','NSEIZ']\n",
    "\n",
    "nc=['CSEX','DMAGE','MRACE3','DPLURAL','BIRMON','WEEKDAYB','FLGND'] # Variables that don't contain missing values indicators in the original encoding\n",
    "tot=c_9+c_99+c_9999+c_5+c_3+c_8_9+nc\n",
    "\n",
    "values_to_remove_dict={\n",
    "    'c_3': ['3'],\n",
    "    'c_5': ['5'],\n",
    "    'c_9': ['9'],\n",
    "    'c_99': ['99'],\n",
    "    'c_9999': ['9999'],\n",
    "    'c_8_9': ['8','9'], \n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a97b6bd-7a80-4f13-b07b-9a2d439a83f9",
   "metadata": {},
   "source": [
    "## Data Processing"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e4c6853-5b7f-421d-9759-0db5cfd14b24",
   "metadata": {},
   "source": [
    "We select 50 covariates based on their [importance](https://www.cdc.gov/nchs/data/mvsr/supp/mv46_06s2.pdf)</span> and the percentage of missing observations. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "c9208353-633a-4cd8-b3a1-e860cc0b974c",
   "metadata": {},
   "outputs": [],
   "source": [
    "df['CSEX']=df['original'].str[78:79]        # Sex of newborn\n",
    "df['DMAGE']=df['original'].str[29:31]       # Mother's age\n",
    "df['MRACE3']=df['original'].str[37:38]      # Mother's race\n",
    "\n",
    "df['DMAR']=df['original'].str[42:43]        # Marital status\n",
    "df['MONPRE']=df['original'].str[52:53]      # Montn parenatal care began\n",
    "\n",
    "df['PLDEL']=df['original'].str[66:67]       # Place of delivery\n",
    "df['GESTAT']=df['original'].str[73:75]      # Gestation weeks\n",
    "df['DBIRWT']=df['original'].str[80:84]      # Birth weight\n",
    "df['DPLURAL']=df['original'].str[88:89]     # Plural births\n",
    "df['DELMETH5']=df['original'].str[98:99]    # Method of delivery\n",
    "\n",
    "df['BIRMON']=df['original'].str[204:206]    # Month of birth\n",
    "df['WEEKDAYB']=df['original'].str[208:209]  # Day of the week birth\n",
    "\n",
    "df['DTOTORD']=df['original'].str[46:48]     # Number previous births or interruptions of pregnancy\n",
    "df['MPLBIRR']=df['original'].str[45:46]     # Place of birth mother\n",
    "\n",
    "df['BIRATTND']=df['original'].str[67:68]    # Attendant at the birth\n",
    "df['CARDIAC']=df['original'].str[101:102]   # Flag for cardiac disease \n",
    "df['LUNG']=df['original'].str[102:103]      # Flag for lung disease\n",
    "df['DIABETES']=df['original'].str[103:104]  # Falg for diabetes\n",
    "df['HYDRA']=df['original'].str[105:106]     # Flag for hydramnios \n",
    "df['HEMO']=df['original'].str[106:107]      # Flag for hemoglobinopathy\n",
    "df['CHYPER']=df['original'].str[107:108]    # Flag for hypertension chronic\n",
    "df['PHYYPER']=df['original'].str[108:109]   # Flag for hypertension related to pregnancy \n",
    "df['ECLAMP']=df['original'].str[109:110]    # Flag for eclampsia\n",
    "df['INCERVIX']=df['original'].str[110:111]  # Flag for incomplete cervix\n",
    "df['PRETERM']=df['original'].str[112:113]   # Flag previous preterm \n",
    "df['OTHERMR']=df['original'].str[116:117]   # Flag for other medical risk of the mother\n",
    "\n",
    "c=['AMNIO','MONITOR','INDUCT', 'STIMULA','TOCOL','ULTRAS','OTHEROB'] # Flag for obstetric procedures\n",
    "for i in range(len(c)):\n",
    "    df[c[i]] = df['original'].str[130+i-1:130+i]\n",
    "    \n",
    "\n",
    "c=['FEBRILE','MECONIUM','RUPTURE','ABRUPTIO','PREPLACE','EXCEBLD', 'SEIZURE','PRECIP','PROLONG'] # Flag for labor complication\n",
    "for i in range(len(c)):\n",
    "    df[c[i]] = df['original'].str[138+i-1:138+i]\n",
    "\n",
    "\n",
    "df['CEPHALO']=df['original'].str[148:149]   # Cephalopelvic disproportion flag\n",
    "df['CORD']=df['original'].str[149:150]      # Cord  prolapse  \n",
    "df['OTHERLB']=df['original'].str[152:153]   # Flag for other labor complication\n",
    "\n",
    "df['NANEMIA']=df['original'].str[154:155]  # Flag for anemia in newborn\n",
    "df['HYALINE']=df['original'].str[157:158]  # Flag for hyaline  membrane disease in newborn\n",
    "df['MECONSYN']=df['original'].str[158:159] # Flag for meconium aspiration syndrome in newborn\n",
    "df['NSEIZ']=df['original'].str[161:162]    # Flag for seizures in newborn\n",
    "\n",
    "\n",
    "df['FLGND']=df['original'].str[209:210]    # Flag for death infant in 1995"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3706a75-2537-4bce-9e3e-e13d83edff75",
   "metadata": {},
   "source": [
    "<hr style=\"border: 2px solid black\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee5ffe28-bc13-46f8-afc7-b329e2ac8f7b",
   "metadata": {},
   "source": [
    "We count the number of missing observations for each of the selected covariates.\n",
    "We also check the possible values taken by the covariates. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "fecdc9f6-a9c8-4183-ab1a-cd7826004421",
   "metadata": {},
   "outputs": [],
   "source": [
    "analysis=True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e56b3155-e61a-4889-8d3c-0332d5fed8fd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Column: DMAR 0\n",
      "Column: PLDEL 731\n",
      "Column: BIRATTND 6872\n",
      "\n",
      "Column: MONPRE 0\n",
      "Column: GESTAT 36501\n",
      "Column: DTOTORD 36763\n",
      "\n",
      "Column: DBIRWT 1683\n",
      "\n",
      "Column: DELMETH5 29189\n",
      "\n",
      "Column: MPLBIRR 9659\n",
      "\n",
      "Column: CARDIAC 45646\n",
      "Column: LUNG 45646\n",
      "Column: DIABETES 45646\n",
      "Column: HYDRA 45646\n",
      "Column: HEMO 45646\n",
      "Column: CHYPER 45646\n",
      "Column: PHYYPER 45646\n",
      "Column: ECLAMP 45646\n",
      "Column: INCERVIX 45646\n",
      "Column: PRETERM 45646\n",
      "Column: OTHERMR 45646\n",
      "Column: AMNIO 32493\n",
      "Column: MONITOR 32493\n",
      "Column: INDUCT 32493\n",
      "Column: STIMULA 32493\n",
      "Column: TOCOL 32493\n",
      "Column: ULTRAS 32493\n",
      "Column: OTHEROB 32493\n",
      "Column: FEBRILE 38274\n",
      "Column: MECONIUM 38274\n",
      "Column: RUPTURE 38274\n",
      "Column: ABRUPTIO 38274\n",
      "Column: PREPLACE 38274\n",
      "Column: EXCEBLD 38274\n",
      "Column: SEIZURE 38274\n",
      "Column: PRECIP 38274\n",
      "Column: PROLONG 38274\n",
      "Column: CEPHALO 38274\n",
      "Column: CORD 38274\n",
      "Column: OTHERLB 38275\n",
      "Column: NANEMIA 48841\n",
      "Column: HYALINE 48841\n",
      "Column: MECONSYN 48841\n",
      "Column: NSEIZ 48844\n",
      "\n",
      "Column: DMAR ['1' '2' '0']\n",
      "Column: PLDEL ['1' '4' '5' '3' '2' '9']\n",
      "Column: BIRATTND ['1' '3' '5' '2' '9' '4' '0' '8' '6' '7']\n",
      "Column: MONPRE ['1' '2' '3' '4' '5' '9' '0' '7']\n",
      "Column: GESTAT ['34' '37' '40' '38' '43' '41' '42' '39' '31' '36' '35' '44' '25' '32'\n",
      " '33' '45' '29' '46' '99' '30' '24' '47' '27' '23' '28' '20' '21' '26'\n",
      " '22' '19' '17' '18' ' 2' ' 1']\n",
      "Column: DTOTORD ['01' '02' '05' '03' '04' '08' '07' '06' '10' '99' '09' '11' '12' '13'\n",
      " '15' '16' '14' '18' '17' '19' '24' '20' '22' '21' '27' '26' '23' '28']\n",
      "Column: DBIRWT ['2702' '3642' '3489' ... '7172' '5319' '5062']\n",
      "Column: DELMETH5 ['3' '1' '5' '4' '2']\n",
      "Column: MPLBIRR ['1' '2' '3' '4' '5' '6']\n",
      "Column: CARDIAC ['2' '9' '1']\n",
      "Column: LUNG ['2' '9' '1']\n",
      "Column: DIABETES ['2' '9' '1']\n",
      "Column: HYDRA ['2' '9' '1']\n",
      "Column: HEMO ['2' '9' '1']\n",
      "Column: CHYPER ['2' '9' '1']\n",
      "Column: PHYYPER ['2' '9' '1']\n",
      "Column: ECLAMP ['2' '9' '1']\n",
      "Column: INCERVIX ['2' '9' '1']\n",
      "Column: PRETERM ['2' '9' '1']\n",
      "Column: OTHERMR ['1' '2' '9' '0' '6' '3']\n",
      "Column: AMNIO ['2' '1' '9']\n",
      "Column: MONITOR ['1' '2' '9']\n",
      "Column: INDUCT ['2' '1' '9']\n",
      "Column: STIMULA ['2' '1' '9' ' ']\n",
      "Column: TOCOL ['2' '1' '9']\n",
      "Column: ULTRAS ['1' '2' '9']\n",
      "Column: OTHEROB ['2' '1' '9']\n",
      "Column: FEBRILE ['2' '1' '9']\n",
      "Column: MECONIUM ['2' '1' '9']\n",
      "Column: RUPTURE ['2' '1' '9']\n",
      "Column: ABRUPTIO ['2' '9' '1']\n",
      "Column: PREPLACE ['2' '1' '9']\n",
      "Column: EXCEBLD ['2' '9' '1']\n",
      "Column: SEIZURE ['2' '9' '1']\n",
      "Column: PRECIP ['2' '1' '9']\n",
      "Column: PROLONG ['2' '1' '9']\n",
      "Column: CEPHALO ['2' '1' '9']\n",
      "Column: CORD ['2' '9' '1' ' ']\n",
      "Column: OTHERLB ['1' '2' '9' '8']\n",
      "Column: NANEMIA ['2' '9' '1']\n",
      "Column: HYALINE ['2' '9' '1']\n",
      "Column: MECONSYN ['2' '9' '1']\n",
      "Column: NSEIZ ['2' '9' '1' '8']\n",
      "Column: CSEX ['2' '1' '4' '6' '3' '0' '7' '5' '8' '9']\n",
      "Column: DMAGE ['25' '14' '29' '32' '31' '22' '19' '21' '23' '20' '24' '27' '17' '18'\n",
      " '35' '40' '16' '36' '28' '30' '26' '38' '41' '33' '37' '15' '34' '42'\n",
      " '39' '43' '13' '46' '45' '44' '12' '11' '47' '10' '6 ' '49' '2 ' '48'\n",
      " '1 ' '8 ' '7 ' '3 ' '4 ' '9 ']\n",
      "Column: MRACE3 ['1' '3' '2' ' ']\n",
      "Column: DPLURAL ['2' '1' '4' '3' '5']\n",
      "Column: BIRMON ['01' '02' '03' '04' '05' '06' '07' '08' '09' '10' '3 ' '11' '12' '5 '\n",
      " '4 ' '7 ' '6 ' '1 ' '2 ']\n",
      "Column: WEEKDAYB ['1' '3' '2' '4' '5' '6' '7' '']\n",
      "Column: FLGND [' ' '1' '']\n"
     ]
    }
   ],
   "source": [
    "if analysis:\n",
    "    for column in c_9:\n",
    "        print(f\"Column: {column}\",len(df[df[column]== '9']))\n",
    "    print()   \n",
    "    for column in c_99:\n",
    "        print(f\"Column: {column}\",len(df[df[column]== '99']))\n",
    "    print()  \n",
    "    for column in c_9999:\n",
    "        print(f\"Column: {column}\",len(df[df[column]== '9999']))\n",
    "    print()      \n",
    "    for column in c_5:\n",
    "        print(f\"Column: {column}\",len(df[df[column]== '5']))\n",
    "    print()      \n",
    "    for column in c_3:\n",
    "        print(f\"Column: {column}\",len(df[df[column]== '3']))\n",
    "    print()  \n",
    "    \n",
    "    for column in c_8_9:\n",
    "        print(f\"Column: {column}\",len(df[df[column].isin(['8', '9'])]))\n",
    "    print()\n",
    "    for column in tot:\n",
    "\n",
    "        print(f\"Column: {column}\",df[column].unique())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "48f76354-7e2a-4316-bcf9-33c91a4e1741",
   "metadata": {},
   "source": [
    "<hr style=\"border: 2px solid black\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97e39c9c-fbff-4482-b9c4-749ed21691d8",
   "metadata": {},
   "source": [
    "We delete the row containing any missing values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "4cb9d832-befe-4f27-babc-343b5c96d1d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "for key, remove_values in values_to_remove_dict.items():\n",
    "    columns_to_process = locals()[key]  \n",
    "    for column in columns_to_process:\n",
    "        mask = df[column].isin(remove_values)\n",
    "        df.drop(df[mask].index, inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3bd2c89-223d-4550-86d5-5c6a4d81d545",
   "metadata": {},
   "source": [
    "<hr style=\"border: 2px solid black\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31f16c6f-43bb-4162-9e8f-bf974be2aa65",
   "metadata": {},
   "source": [
    "We remove the original column as it is not needed anymore. Additionaly, we adjust the format of a few covariates to ensure they are suitable for the next stages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "f2ca94a8-c1cd-4c3b-9997-0670956c6226",
   "metadata": {},
   "outputs": [],
   "source": [
    "df = df.drop('original',axis=1)\n",
    "\n",
    "#MRACE\n",
    "df.drop(df[df['MRACE3']==' '].index, inplace=True)\n",
    "df['MRACE3'].astype(int)\n",
    "\n",
    "#STIMULA\n",
    "df.drop(df[df['STIMULA']==' '].index, inplace=True)\n",
    "df['STIMULA'].astype(int)\n",
    "\n",
    "#CORD\n",
    "df.drop(df[df['CORD']==' '].index, inplace=True)\n",
    "df['CORD'].astype(int)\n",
    "\n",
    "#WEEKDAYB\n",
    "df.drop(df[df['WEEKDAYB']==''].index, inplace=True)\n",
    "df['WEEKDAYB'].astype(int)\n",
    "\n",
    "#FLAG\n",
    "df['FLGND']=df['FLGND'].replace({' ': '0' ,'':'0'})\n",
    "df['FLGND']=df['FLGND'].replace(0, '0')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "4e43bce5-f83e-4135-ac66-1f2c8f9f729b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CSEX</th>\n",
       "      <th>DMAGE</th>\n",
       "      <th>MRACE3</th>\n",
       "      <th>DMAR</th>\n",
       "      <th>MONPRE</th>\n",
       "      <th>PLDEL</th>\n",
       "      <th>GESTAT</th>\n",
       "      <th>DBIRWT</th>\n",
       "      <th>DPLURAL</th>\n",
       "      <th>DELMETH5</th>\n",
       "      <th>...</th>\n",
       "      <th>PRECIP</th>\n",
       "      <th>PROLONG</th>\n",
       "      <th>CEPHALO</th>\n",
       "      <th>CORD</th>\n",
       "      <th>OTHERLB</th>\n",
       "      <th>NANEMIA</th>\n",
       "      <th>HYALINE</th>\n",
       "      <th>MECONSYN</th>\n",
       "      <th>NSEIZ</th>\n",
       "      <th>FLGND</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>25</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>34</td>\n",
       "      <td>2702</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>25</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>34</td>\n",
       "      <td>2702</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>14</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>37</td>\n",
       "      <td>3642</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>29</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>40</td>\n",
       "      <td>3489</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>32</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>38</td>\n",
       "      <td>3351</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3903007</th>\n",
       "      <td>1</td>\n",
       "      <td>23</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>37</td>\n",
       "      <td>2948</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3903008</th>\n",
       "      <td>2</td>\n",
       "      <td>26</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>39</td>\n",
       "      <td>3515</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3903009</th>\n",
       "      <td>2</td>\n",
       "      <td>21</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>40</td>\n",
       "      <td>3119</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3903010</th>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>39</td>\n",
       "      <td>2892</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3903011</th>\n",
       "      <td>2</td>\n",
       "      <td>19</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>35</td>\n",
       "      <td>2552</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3733678 rows × 50 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        CSEX DMAGE MRACE3 DMAR MONPRE PLDEL GESTAT DBIRWT DPLURAL DELMETH5  \\\n",
       "0          2    25      1    1      1     1     34   2702       2        3   \n",
       "1          2    25      1    1      1     1     34   2702       2        3   \n",
       "2          1    14      3    2      2     1     37   3642       1        1   \n",
       "3          1    29      1    1      1     1     40   3489       1        1   \n",
       "4          1    32      1    1      1     1     38   3351       1        1   \n",
       "...      ...   ...    ...  ...    ...   ...    ...    ...     ...      ...   \n",
       "3903007    1    23      1    1      1     1     37   2948       1        1   \n",
       "3903008    2    26      1    1      1     4     39   3515       1        1   \n",
       "3903009    2    21      1    2      2     1     40   3119       1        1   \n",
       "3903010    1    25      1    1      1     1     39   2892       1        1   \n",
       "3903011    2    19      1    1      1     4     35   2552       1        1   \n",
       "\n",
       "         ... PRECIP PROLONG CEPHALO CORD OTHERLB NANEMIA HYALINE MECONSYN  \\\n",
       "0        ...      2       2       2    2       1       2       2        2   \n",
       "1        ...      2       2       2    2       1       2       2        2   \n",
       "2        ...      2       2       2    2       2       2       2        2   \n",
       "3        ...      2       2       2    2       2       2       2        2   \n",
       "4        ...      2       2       2    2       2       2       2        2   \n",
       "...      ...    ...     ...     ...  ...     ...     ...     ...      ...   \n",
       "3903007  ...      2       2       2    2       2       2       2        2   \n",
       "3903008  ...      2       2       2    2       2       2       2        2   \n",
       "3903009  ...      2       2       2    2       2       2       2        2   \n",
       "3903010  ...      2       2       2    2       2       2       2        2   \n",
       "3903011  ...      2       2       2    2       2       2       2        2   \n",
       "\n",
       "        NSEIZ FLGND  \n",
       "0           2     0  \n",
       "1           2     0  \n",
       "2           2     0  \n",
       "3           2     0  \n",
       "4           2     0  \n",
       "...       ...   ...  \n",
       "3903007     2     0  \n",
       "3903008     2     0  \n",
       "3903009     2     0  \n",
       "3903010     2     0  \n",
       "3903011     2     0  \n",
       "\n",
       "[3733678 rows x 50 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b187f3c3-922e-4b37-bb20-5f0cf00a82f4",
   "metadata": {},
   "source": [
    "## Export Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "4e654958-a53a-4fc1-9eba-e81aaebb3e1a",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('LBIDD_den_tot.csv',index=False)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a392f0c-8e41-4cc3-9c2a-365fcae7439a",
   "metadata": {},
   "source": [
    "<hr style=\"border: 2px solid black\">"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "LBIDD_env",
   "language": "python",
   "name": "lbidd_env"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
