{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Automunge under automation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Automunge is available now for pip install:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install Automunge"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or to upgrade (we currently roll out upgrades pretty frequently):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install Automunge --upgrade"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once installed, run this in a local session to initialize:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from Automunge import *\n",
    "am = AutoMunge()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Under automation, the automunge(.) function will: \n",
    "- normalize numeric features\n",
    "- binarize bounded categoric features\n",
    "- hash unbounded categoric features\n",
    "- encode date-time entries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To demonstrate, let's encode the Titanic set, a well known benchmark:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "#titanic set\n",
    "df_train = pd.read_csv('train.csv')\n",
    "df_test = pd.read_csv('test.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is what the data looks like in a raw form."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll need to designate to automunge any columns that are to be treated as labels or ID sets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "#titanic set\n",
    "labels_column = 'Survived'\n",
    "trainID_column = 'PassengerId'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then pass these dataframes to the automunge(.) function for processing.\n",
    "\n",
    "Note that the function call returns 10 sets. some of which may be empty based on parameter configurations. It's an unusual convention but we find that by having one return configuration for all scenarios it keeps things simple."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_______________\n",
      "Begin Automunge processing\n",
      "\n",
      "evaluating column:  Pclass\n",
      "processing column:  Pclass\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1']\n",
      "\n",
      "evaluating column:  Name\n",
      "processing column:  Name\n",
      "    root category:  hash\n",
      " returned columns:\n",
      "['Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Name_hash_9', 'Name_hash_10', 'Name_hash_11', 'Name_hash_12', 'Name_hash_13']\n",
      "\n",
      "evaluating column:  Sex\n",
      "processing column:  Sex\n",
      "    root category:  bnry\n",
      " returned columns:\n",
      "['Sex_bnry', 'Sex_NArw']\n",
      "\n",
      "evaluating column:  Age\n",
      "processing column:  Age\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['Age_nmbr', 'Age_NArw']\n",
      "\n",
      "evaluating column:  SibSp\n",
      "processing column:  SibSp\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['SibSp_nmbr', 'SibSp_NArw']\n",
      "\n",
      "evaluating column:  Parch\n",
      "processing column:  Parch\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['Parch_nmbr', 'Parch_NArw']\n",
      "\n",
      "evaluating column:  Ticket\n",
      "processing column:  Ticket\n",
      "    root category:  hash\n",
      " returned columns:\n",
      "['Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2']\n",
      "\n",
      "evaluating column:  Fare\n",
      "processing column:  Fare\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['Fare_nmbr', 'Fare_NArw']\n",
      "\n",
      "evaluating column:  Cabin\n",
      "processing column:  Cabin\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6', 'Cabin_1010_7']\n",
      "\n",
      "evaluating column:  Embarked\n",
      "processing column:  Embarked\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "______\n",
      "\n",
      "evaluating label column:  Survived\n",
      "processing label column:  Survived\n",
      "    root label category:  lbor\n",
      "\n",
      " returned columns:\n",
      "['Survived_ordl']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  Sex_bnry\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Age_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  SibSp_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Parch_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Fare_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Pclass_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Pclass_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_2\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_3\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_4\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_5\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_6\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_7\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Embarked_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Embarked_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "______\n",
      "\n",
      "versioning serial stamp:\n",
      "_6.02_820216868519_2021-04-23T21:21:46.024156\n",
      "\n",
      "Automunge returned ID column set: \n",
      "['PassengerId', 'Automunge_index']\n",
      "\n",
      "Automunge returned train column set: \n",
      "['Sex_bnry', 'Age_nmbr', 'SibSp_nmbr', 'Parch_nmbr', 'Fare_nmbr', 'Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1', 'Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Name_hash_9', 'Name_hash_10', 'Name_hash_11', 'Name_hash_12', 'Name_hash_13', 'Sex_NArw', 'Age_NArw', 'SibSp_NArw', 'Parch_NArw', 'Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2', 'Fare_NArw', 'Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6', 'Cabin_1010_7', 'Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "Automunge returned label column set: \n",
      "['Survived_ordl']\n",
      "\n",
      "_______________\n",
      "Automunge Complete\n",
      "\n"
     ]
    }
   ],
   "source": [
    "train, train_ID, labels, \\\n",
    "val, val_ID, val_labels, \\\n",
    "test, test_ID, test_labels, \\\n",
    "postprocess_dict = \\\n",
    "am.automunge(df_train,\n",
    "             labels_column = labels_column,\n",
    "             trainID_column = trainID_column)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The returned data can be accessed in the sets:\n",
    "train, trainID, labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sex_bnry</th>\n",
       "      <th>Age_nmbr</th>\n",
       "      <th>SibSp_nmbr</th>\n",
       "      <th>Parch_nmbr</th>\n",
       "      <th>Fare_nmbr</th>\n",
       "      <th>Pclass_NArw</th>\n",
       "      <th>Pclass_1010_0</th>\n",
       "      <th>Pclass_1010_1</th>\n",
       "      <th>Name_NArw</th>\n",
       "      <th>Name_hash_0</th>\n",
       "      <th>...</th>\n",
       "      <th>Cabin_1010_1</th>\n",
       "      <th>Cabin_1010_2</th>\n",
       "      <th>Cabin_1010_3</th>\n",
       "      <th>Cabin_1010_4</th>\n",
       "      <th>Cabin_1010_5</th>\n",
       "      <th>Cabin_1010_6</th>\n",
       "      <th>Cabin_1010_7</th>\n",
       "      <th>Embarked_NArw</th>\n",
       "      <th>Embarked_1010_0</th>\n",
       "      <th>Embarked_1010_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>717</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.207592</td>\n",
       "      <td>-0.474279</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>-0.436762</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>546</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>202</th>\n",
       "      <td>1</td>\n",
       "      <td>0.330786</td>\n",
       "      <td>-0.474279</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>-0.517340</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>31</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>659</th>\n",
       "      <td>1</td>\n",
       "      <td>2.176654</td>\n",
       "      <td>-0.474279</td>\n",
       "      <td>2.007806</td>\n",
       "      <td>1.631419</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>654</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>438</th>\n",
       "      <td>1</td>\n",
       "      <td>2.638120</td>\n",
       "      <td>0.432550</td>\n",
       "      <td>4.489019</td>\n",
       "      <td>4.644392</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>758</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>325</th>\n",
       "      <td>0</td>\n",
       "      <td>0.484608</td>\n",
       "      <td>-0.474279</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>2.081343</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>493</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 44 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Sex_bnry  Age_nmbr  SibSp_nmbr  Parch_nmbr  Fare_nmbr  Pclass_NArw  \\\n",
       "717         0 -0.207592   -0.474279   -0.473408  -0.436762            0   \n",
       "202         1  0.330786   -0.474279   -0.473408  -0.517340            0   \n",
       "659         1  2.176654   -0.474279    2.007806   1.631419            0   \n",
       "438         1  2.638120    0.432550    4.489019   4.644392            0   \n",
       "325         0  0.484608   -0.474279   -0.473408   2.081343            0   \n",
       "\n",
       "     Pclass_1010_0  Pclass_1010_1  Name_NArw  Name_hash_0  ...  Cabin_1010_1  \\\n",
       "717              0              1          0          546  ...             1   \n",
       "202              1              0          0           31  ...             0   \n",
       "659              0              0          0          654  ...             1   \n",
       "438              0              0          0          758  ...             0   \n",
       "325              0              0          0          493  ...             1   \n",
       "\n",
       "     Cabin_1010_2  Cabin_1010_3  Cabin_1010_4  Cabin_1010_5  Cabin_1010_6  \\\n",
       "717             1             1             0             1             0   \n",
       "202             0             0             0             0             0   \n",
       "659             1             0             1             1             0   \n",
       "438             1             1             1             1             1   \n",
       "325             0             0             0             0             0   \n",
       "\n",
       "     Cabin_1010_7  Embarked_NArw  Embarked_1010_0  Embarked_1010_1  \n",
       "717             0              0                1                0  \n",
       "202             0              0                1                0  \n",
       "659             0              0                0                0  \n",
       "438             1              0                1                0  \n",
       "325             1              0                0                0  \n",
       "\n",
       "[5 rows x 44 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the column headers of the returned data are different, now including suffix appenders logging the applied transformations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Any carved out ID sets are included in the trainID set as well as an aggregated set of index numbers (since the function by default shuffles training data)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Automunge_index</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>717</th>\n",
       "      <td>718</td>\n",
       "      <td>717</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>202</th>\n",
       "      <td>203</td>\n",
       "      <td>202</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>659</th>\n",
       "      <td>660</td>\n",
       "      <td>659</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>438</th>\n",
       "      <td>439</td>\n",
       "      <td>438</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>325</th>\n",
       "      <td>326</td>\n",
       "      <td>325</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     PassengerId  Automunge_index\n",
       "717          718              717\n",
       "202          203              202\n",
       "659          660              659\n",
       "438          439              438\n",
       "325          326              325"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_ID.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "717    1\n",
       "202    0\n",
       "659    0\n",
       "438    0\n",
       "325    1\n",
       "Name: Survived_ordl, dtype: uint8"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "labels.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# a few more common parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A few options that might come up often:\n",
    "- if we have test data available at same time as train data, we can also pass a test set\n",
    "- if we want to carve out a validation set processed on the train set basis we can designate a ratio by the valpercent parameter\n",
    "- if we want to turn off printouts we can turn off with printstatus = False\n",
    "- if we want to return numpy arrays instead of dataframes can pass pandasoutput = False\n",
    "- for including with transformations a marker for entries that were subject to infill defaults to NArw_marker = True\n",
    "- for auto ML derived missing data infill can apply MLinfill = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "train, train_ID, labels, \\\n",
    "val, val_ID, val_labels, \\\n",
    "test, test_ID, test_labels, \\\n",
    "postprocess_dict = \\\n",
    "am.automunge(df_train,\n",
    "             df_test = df_test,\n",
    "             labels_column = labels_column,\n",
    "             trainID_column = trainID_column,\n",
    "             valpercent = 0.2, \n",
    "             printstatus = False, \n",
    "             pandasoutput = False,\n",
    "             MLinfill = True,\n",
    "             NArw_marker = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 1.        ,  0.3929741 , -0.4697539 , ...,  0.        ,\n",
       "         0.        ,  1.        ],\n",
       "       [ 0.        ,  1.3762842 ,  0.47773838, ...,  0.        ,\n",
       "         1.        ,  0.        ],\n",
       "       [ 1.        ,  2.5562565 , -0.4697539 , ...,  0.        ,\n",
       "         0.        ,  1.        ],\n",
       "       ...,\n",
       "       [ 1.        ,  0.7076334 , -0.4697539 , ...,  0.        ,\n",
       "         1.        ,  0.        ],\n",
       "       [ 1.        ,  0.10899416, -0.4697539 , ...,  0.        ,\n",
       "         1.        ,  0.        ],\n",
       "       [ 1.        , -2.0843618 ,  0.47773838, ...,  0.        ,\n",
       "         0.        ,  0.        ]], dtype=float32)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#the test data is returned in test, testID, testlabels\n",
    "#here as a numpy array based on pandasoutput parameter\n",
    "test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Processing additional test data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of the various returned sets, an important one is the final object we call the postprocess_dict. Think of this as a key to processing additioanl data on the original train set basis. If you intend to productionize a model we recomend saving externally such as with the pickle library. Once we have additional data we want to process we can pass it with the postprocess_dict to the postmunge(.) function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_______________\n",
      "Begin Postmunge processing\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Pclass\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Name\n",
      "    root category:  hash\n",
      "\n",
      " returned columns:\n",
      "['Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Name_hash_9', 'Name_hash_10', 'Name_hash_11', 'Name_hash_12', 'Name_hash_13']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Sex\n",
      "    root category:  bnry\n",
      "\n",
      " returned columns:\n",
      "['Sex_bnry', 'Sex_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Age\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['Age_nmbr', 'Age_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  SibSp\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['SibSp_nmbr', 'SibSp_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Parch\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['Parch_nmbr', 'Parch_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Ticket\n",
      "    root category:  hash\n",
      "\n",
      " returned columns:\n",
      "['Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Fare\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['Fare_nmbr', 'Fare_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Cabin\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Embarked\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  Sex_bnry\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Age_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  SibSp_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Parch_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Fare_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Pclass_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Pclass_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_2\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_3\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_4\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_5\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_6\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Embarked_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Embarked_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "_______________\n",
      "Postmunge returned ID column set: \n",
      "['PassengerId', 'Automunge_index']\n",
      "\n",
      "Postmunge returned test column set: \n",
      "['Sex_bnry', 'Age_nmbr', 'SibSp_nmbr', 'Parch_nmbr', 'Fare_nmbr', 'Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1', 'Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Name_hash_9', 'Name_hash_10', 'Name_hash_11', 'Name_hash_12', 'Name_hash_13', 'Sex_NArw', 'Age_NArw', 'SibSp_NArw', 'Parch_NArw', 'Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2', 'Fare_NArw', 'Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6', 'Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "_______________\n",
      "Postmunge Complete\n",
      "\n"
     ]
    }
   ],
   "source": [
    "test, test_ID, test_labels, \\\n",
    "postreports_dict \\\n",
    "= am.postmunge(postprocess_dict, df_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sex_bnry</th>\n",
       "      <th>Age_nmbr</th>\n",
       "      <th>SibSp_nmbr</th>\n",
       "      <th>Parch_nmbr</th>\n",
       "      <th>Fare_nmbr</th>\n",
       "      <th>Pclass_NArw</th>\n",
       "      <th>Pclass_1010_0</th>\n",
       "      <th>Pclass_1010_1</th>\n",
       "      <th>Name_NArw</th>\n",
       "      <th>Name_hash_0</th>\n",
       "      <th>...</th>\n",
       "      <th>Cabin_1010_0</th>\n",
       "      <th>Cabin_1010_1</th>\n",
       "      <th>Cabin_1010_2</th>\n",
       "      <th>Cabin_1010_3</th>\n",
       "      <th>Cabin_1010_4</th>\n",
       "      <th>Cabin_1010_5</th>\n",
       "      <th>Cabin_1010_6</th>\n",
       "      <th>Embarked_NArw</th>\n",
       "      <th>Embarked_1010_0</th>\n",
       "      <th>Embarked_1010_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.392974</td>\n",
       "      <td>-0.469754</td>\n",
       "      <td>-0.453453</td>\n",
       "      <td>-0.471396</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>766</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>1.376284</td>\n",
       "      <td>0.477738</td>\n",
       "      <td>-0.453453</td>\n",
       "      <td>-0.487849</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>376</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>2.556257</td>\n",
       "      <td>-0.469754</td>\n",
       "      <td>-0.453453</td>\n",
       "      <td>-0.434524</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>269</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>-0.197012</td>\n",
       "      <td>-0.469754</td>\n",
       "      <td>-0.453453</td>\n",
       "      <td>-0.454862</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>625</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.590336</td>\n",
       "      <td>0.477738</td>\n",
       "      <td>0.760301</td>\n",
       "      <td>-0.382935</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>931</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 43 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Sex_bnry  Age_nmbr  SibSp_nmbr  Parch_nmbr  Fare_nmbr  Pclass_NArw  \\\n",
       "0         1  0.392974   -0.469754   -0.453453  -0.471396            0   \n",
       "1         0  1.376284    0.477738   -0.453453  -0.487849            0   \n",
       "2         1  2.556257   -0.469754   -0.453453  -0.434524            0   \n",
       "3         1 -0.197012   -0.469754   -0.453453  -0.454862            0   \n",
       "4         0 -0.590336    0.477738    0.760301  -0.382935            0   \n",
       "\n",
       "   Pclass_1010_0  Pclass_1010_1  Name_NArw  Name_hash_0  ...  Cabin_1010_0  \\\n",
       "0              1              0          0          766  ...             0   \n",
       "1              1              0          0          376  ...             0   \n",
       "2              0              1          0          269  ...             0   \n",
       "3              1              0          0          625  ...             0   \n",
       "4              1              0          0          931  ...             0   \n",
       "\n",
       "   Cabin_1010_1  Cabin_1010_2  Cabin_1010_3  Cabin_1010_4  Cabin_1010_5  \\\n",
       "0             0             0             0             0             0   \n",
       "1             0             0             0             0             0   \n",
       "2             0             0             0             0             0   \n",
       "3             0             0             0             0             0   \n",
       "4             0             0             0             0             0   \n",
       "\n",
       "   Cabin_1010_6  Embarked_NArw  Embarked_1010_0  Embarked_1010_1  \n",
       "0             0              0                0                1  \n",
       "1             0              0                1                0  \n",
       "2             0              0                0                1  \n",
       "3             0              0                1                0  \n",
       "4             0              0                1                0  \n",
       "\n",
       "[5 rows x 43 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Custom transformations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Automunge has a library of transformations (documented in the read me). In general, each of these transformations is fit to properties of the train set to enable processing on a consistent basis of additional data.\n",
    "\n",
    "Each transformation in the libary has a distinct 4 character string identifier, generally aligned with the suffix appender on the returned set. \n",
    "\n",
    "We can designate our assignments in the assigncat parameter as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "#here we designate min-max scaling to the column 'Fare'\n",
    "assigncat = {'mnmx':['Fare']}\n",
    "\n",
    "train, train_ID, labels, \\\n",
    "val, val_ID, val_labels, \\\n",
    "test, test_ID, test_labels, \\\n",
    "postprocess_dict \\\n",
    "= am.automunge(df_train,\n",
    "               labels_column = labels_column,\n",
    "               trainID_column = trainID_column,\n",
    "               assigncat = assigncat,\n",
    "               printstatus = False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To view the columns returned from a specific input column can use the column map stored in the postprocess_dict."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Fare_mnmx</th>\n",
       "      <th>Fare_NArw</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>0.017567</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>0.051822</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>427</th>\n",
       "      <td>0.050749</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>475</th>\n",
       "      <td>0.101497</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>0.045771</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Fare_mnmx  Fare_NArw\n",
       "80    0.017567          0\n",
       "11    0.051822          0\n",
       "427   0.050749          0\n",
       "475   0.101497          0\n",
       "888   0.045771          0"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train[postprocess_dict['column_map']['Fare']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Missing data infill"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We noted earlier that the MLinfill parameter activates an autoML method for missing data inputation. Let's take a look at this in action. Here we'll turn on ML infill as well as markers for entries subject to infill with the NArw_marker parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "train, train_ID, labels, \\\n",
    "val, val_ID, val_labels, \\\n",
    "test, test_ID, test_labels, \\\n",
    "postprocess_dict \\\n",
    "= am.automunge(df_train,\n",
    "               labels_column = labels_column,\n",
    "               trainID_column = trainID_column,\n",
    "               MLinfill = True,\n",
    "               NArw_marker=True,\n",
    "               printstatus = False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By inspection, if appears that one of the entries in the Age column was subject to infill:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Age_nmbr</th>\n",
       "      <th>Age_NArw</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>0.715342</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>391</th>\n",
       "      <td>-0.669059</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>357</th>\n",
       "      <td>0.638430</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>582</th>\n",
       "      <td>1.869009</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>857</th>\n",
       "      <td>1.638276</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Age_nmbr  Age_NArw\n",
       "13   0.715342         0\n",
       "391 -0.669059         0\n",
       "357  0.638430         0\n",
       "582  1.869009         0\n",
       "857  1.638276         0"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train[postprocess_dict['column_map']['Age']].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It appears the ML infill is assuming for first row's inputation that this is a very young passenger (remember this is normalized data is reason for the negative value)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the trained models for each feature are saved in the postprocess_dict to enable a consistent inputation basis for subsequent data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "ML infill isn't the only inputation option. Other options like mode, adjacent cell, 0/1, mean, etc can be designated to distinct columns with the assigninfill parameter.\n",
    "\n",
    "Here we'll demonstrate applying a few different approaches to different columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "assigninfill = {'MLinfill'  : ['Pclass'],\n",
    "                'adjinfill' : ['Age'],\n",
    "                'modeinfill': ['Fare']}\n",
    "\n",
    "train, train_ID, labels, \\\n",
    "val, val_ID, val_labels, \\\n",
    "test, test_ID, test_labels, \\\n",
    "postprocess_dict \\\n",
    "= am.automunge(df_train,\n",
    "               labels_column = labels_column,\n",
    "               trainID_column = trainID_column,\n",
    "               assigninfill = assigninfill,\n",
    "               printstatus = False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sex_bnry</th>\n",
       "      <th>Age_nmbr</th>\n",
       "      <th>SibSp_nmbr</th>\n",
       "      <th>Parch_nmbr</th>\n",
       "      <th>Fare_nmbr</th>\n",
       "      <th>Pclass_NArw</th>\n",
       "      <th>Pclass_1010_0</th>\n",
       "      <th>Pclass_1010_1</th>\n",
       "      <th>Name_NArw</th>\n",
       "      <th>Name_hash_0</th>\n",
       "      <th>...</th>\n",
       "      <th>Cabin_1010_1</th>\n",
       "      <th>Cabin_1010_2</th>\n",
       "      <th>Cabin_1010_3</th>\n",
       "      <th>Cabin_1010_4</th>\n",
       "      <th>Cabin_1010_5</th>\n",
       "      <th>Cabin_1010_6</th>\n",
       "      <th>Cabin_1010_7</th>\n",
       "      <th>Embarked_NArw</th>\n",
       "      <th>Embarked_1010_0</th>\n",
       "      <th>Embarked_1010_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>617</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.284503</td>\n",
       "      <td>0.432550</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>-0.324071</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>472</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>1</td>\n",
       "      <td>-0.822881</td>\n",
       "      <td>0.432550</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>-0.336145</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>383</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>831</th>\n",
       "      <td>1</td>\n",
       "      <td>-2.220357</td>\n",
       "      <td>0.432550</td>\n",
       "      <td>0.767199</td>\n",
       "      <td>-0.270744</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>613</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>336</th>\n",
       "      <td>1</td>\n",
       "      <td>-0.053770</td>\n",
       "      <td>0.432550</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>0.692160</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>548</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>541</th>\n",
       "      <td>0</td>\n",
       "      <td>-1.591993</td>\n",
       "      <td>3.153038</td>\n",
       "      <td>2.007806</td>\n",
       "      <td>-0.018699</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>831</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 44 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Sex_bnry  Age_nmbr  SibSp_nmbr  Parch_nmbr  Fare_nmbr  Pclass_NArw  \\\n",
       "617         0 -0.284503    0.432550   -0.473408  -0.324071            0   \n",
       "46          1 -0.822881    0.432550   -0.473408  -0.336145            0   \n",
       "831         1 -2.220357    0.432550    0.767199  -0.270744            0   \n",
       "336         1 -0.053770    0.432550   -0.473408   0.692160            0   \n",
       "541         0 -1.591993    3.153038    2.007806  -0.018699            0   \n",
       "\n",
       "     Pclass_1010_0  Pclass_1010_1  Name_NArw  Name_hash_0  ...  Cabin_1010_1  \\\n",
       "617              1              0          0          472  ...             0   \n",
       "46               1              0          0          383  ...             0   \n",
       "831              0              1          0          613  ...             0   \n",
       "336              0              0          0          548  ...             0   \n",
       "541              1              0          0          831  ...             0   \n",
       "\n",
       "     Cabin_1010_2  Cabin_1010_3  Cabin_1010_4  Cabin_1010_5  Cabin_1010_6  \\\n",
       "617             0             0             0             0             0   \n",
       "46              0             0             0             0             0   \n",
       "831             0             0             0             0             0   \n",
       "336             1             1             1             1             0   \n",
       "541             0             0             0             0             0   \n",
       "\n",
       "     Cabin_1010_7  Embarked_NArw  Embarked_1010_0  Embarked_1010_1  \n",
       "617             0              0                1                0  \n",
       "46              0              0                0                1  \n",
       "831             0              0                1                0  \n",
       "336             1              0                1                0  \n",
       "541             0              0                1                0  \n",
       "\n",
       "[5 rows x 44 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In closing, as an explanation, the whole point of conducting all of the transformations in a single function is that this application serves to populate a dictionary (the \"postprocess_dict\") fit to properties of the train data, capturing all of the steps and parameters of transformations, potentially including methods for ML derived missing data inputation, dimensionality reductions, and other various encodings available in the library. This returned dictionary can then be passed to the postmunge(.) function with subsequent data for fully consistent processing on the train set basis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_______________\n",
      "Begin Postmunge processing\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Pclass\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Name\n",
      "    root category:  hash\n",
      "\n",
      " returned columns:\n",
      "['Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Name_hash_9', 'Name_hash_10', 'Name_hash_11', 'Name_hash_12', 'Name_hash_13']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Sex\n",
      "    root category:  bnry\n",
      "\n",
      " returned columns:\n",
      "['Sex_bnry', 'Sex_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Age\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['Age_nmbr', 'Age_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  SibSp\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['SibSp_nmbr', 'SibSp_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Parch\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['Parch_nmbr', 'Parch_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Ticket\n",
      "    root category:  hash\n",
      "\n",
      " returned columns:\n",
      "['Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Fare\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['Fare_nmbr', 'Fare_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Cabin\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6', 'Cabin_1010_7']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Embarked\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  Sex_bnry\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Age_nmbr\n",
      "     infill type: adjinfill\n",
      "\n",
      "infill to column:  SibSp_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Parch_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Fare_nmbr\n",
      "     infill type: modeinfill\n",
      "\n",
      "infill to column:  Pclass_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Pclass_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_2\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_3\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_4\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_5\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_6\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_7\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Embarked_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Embarked_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "_______________\n",
      "Postmunge returned ID column set: \n",
      "['PassengerId', 'Automunge_index']\n",
      "\n",
      "Postmunge returned test column set: \n",
      "['Sex_bnry', 'Age_nmbr', 'SibSp_nmbr', 'Parch_nmbr', 'Fare_nmbr', 'Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1', 'Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Name_hash_9', 'Name_hash_10', 'Name_hash_11', 'Name_hash_12', 'Name_hash_13', 'Sex_NArw', 'Age_NArw', 'SibSp_NArw', 'Parch_NArw', 'Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2', 'Fare_NArw', 'Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6', 'Cabin_1010_7', 'Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "_______________\n",
      "Postmunge Complete\n",
      "\n"
     ]
    }
   ],
   "source": [
    "test, test_ID, test_labels, \\\n",
    "postreports_dict \\\n",
    "= am.postmunge(postprocess_dict, df_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sex_bnry</th>\n",
       "      <th>Age_nmbr</th>\n",
       "      <th>SibSp_nmbr</th>\n",
       "      <th>Parch_nmbr</th>\n",
       "      <th>Fare_nmbr</th>\n",
       "      <th>Pclass_NArw</th>\n",
       "      <th>Pclass_1010_0</th>\n",
       "      <th>Pclass_1010_1</th>\n",
       "      <th>Name_NArw</th>\n",
       "      <th>Name_hash_0</th>\n",
       "      <th>...</th>\n",
       "      <th>Cabin_1010_1</th>\n",
       "      <th>Cabin_1010_2</th>\n",
       "      <th>Cabin_1010_3</th>\n",
       "      <th>Cabin_1010_4</th>\n",
       "      <th>Cabin_1010_5</th>\n",
       "      <th>Cabin_1010_6</th>\n",
       "      <th>Cabin_1010_7</th>\n",
       "      <th>Embarked_NArw</th>\n",
       "      <th>Embarked_1010_0</th>\n",
       "      <th>Embarked_1010_1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.369241</td>\n",
       "      <td>-0.474279</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>-0.490508</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>766</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>1.330631</td>\n",
       "      <td>0.432550</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>-0.507194</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>376</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>2.484298</td>\n",
       "      <td>-0.474279</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>-0.453112</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>269</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>-0.207592</td>\n",
       "      <td>-0.474279</td>\n",
       "      <td>-0.473408</td>\n",
       "      <td>-0.473739</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>625</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.592148</td>\n",
       "      <td>0.432550</td>\n",
       "      <td>0.767199</td>\n",
       "      <td>-0.400792</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>931</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 44 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Sex_bnry  Age_nmbr  SibSp_nmbr  Parch_nmbr  Fare_nmbr  Pclass_NArw  \\\n",
       "0         1  0.369241   -0.474279   -0.473408  -0.490508            0   \n",
       "1         0  1.330631    0.432550   -0.473408  -0.507194            0   \n",
       "2         1  2.484298   -0.474279   -0.473408  -0.453112            0   \n",
       "3         1 -0.207592   -0.474279   -0.473408  -0.473739            0   \n",
       "4         0 -0.592148    0.432550    0.767199  -0.400792            0   \n",
       "\n",
       "   Pclass_1010_0  Pclass_1010_1  Name_NArw  Name_hash_0  ...  Cabin_1010_1  \\\n",
       "0              1              0          0          766  ...             0   \n",
       "1              1              0          0          376  ...             0   \n",
       "2              0              1          0          269  ...             0   \n",
       "3              1              0          0          625  ...             0   \n",
       "4              1              0          0          931  ...             0   \n",
       "\n",
       "   Cabin_1010_2  Cabin_1010_3  Cabin_1010_4  Cabin_1010_5  Cabin_1010_6  \\\n",
       "0             0             0             0             0             0   \n",
       "1             0             0             0             0             0   \n",
       "2             0             0             0             0             0   \n",
       "3             0             0             0             0             0   \n",
       "4             0             0             0             0             0   \n",
       "\n",
       "   Cabin_1010_7  Embarked_NArw  Embarked_1010_0  Embarked_1010_1  \n",
       "0             0              0                0                1  \n",
       "1             0              0                1                0  \n",
       "2             0              0                0                1  \n",
       "3             0              0                1                0  \n",
       "4             0              0                1                0  \n",
       "\n",
       "[5 rows x 44 columns]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
