{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## String Parsing Validation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we will validate string parsing on two example features from the IEEE-CIS Fraud Detection Kaggle competition, with data set available [here](https://www.kaggle.com/c/ieee-fraud-detection).\n",
    "\n",
    "We will perform our evaluation on a subset of features selected based on the top ten features from a feature importance evaluation and then two additional features selected for their viability as a string parsing target based on inspection.\n",
    "\n",
    "The results of the valiation are reported in the final cell of the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from Automunge import Automunger\n",
    "am = Automunger.AutoMunge()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.set_option(\"display.max_columns\", 200)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_identity_path = 'train_identity.csv'\n",
    "train_transaction_path = 'train_transaction.csv'\n",
    "#test_identity_path = 'test_identity.csv'\n",
    "#test_transaction_path = 'test_transaction.csv'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "ID_column = 'TransactionID'\n",
    "label_column = 'isFraud'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id_01</th>\n",
       "      <th>id_02</th>\n",
       "      <th>id_03</th>\n",
       "      <th>id_04</th>\n",
       "      <th>id_05</th>\n",
       "      <th>id_06</th>\n",
       "      <th>id_07</th>\n",
       "      <th>id_08</th>\n",
       "      <th>id_09</th>\n",
       "      <th>id_10</th>\n",
       "      <th>id_11</th>\n",
       "      <th>id_12</th>\n",
       "      <th>id_13</th>\n",
       "      <th>id_14</th>\n",
       "      <th>id_15</th>\n",
       "      <th>id_16</th>\n",
       "      <th>id_17</th>\n",
       "      <th>id_18</th>\n",
       "      <th>id_19</th>\n",
       "      <th>id_20</th>\n",
       "      <th>id_21</th>\n",
       "      <th>id_22</th>\n",
       "      <th>id_23</th>\n",
       "      <th>id_24</th>\n",
       "      <th>id_25</th>\n",
       "      <th>id_26</th>\n",
       "      <th>id_27</th>\n",
       "      <th>id_28</th>\n",
       "      <th>id_29</th>\n",
       "      <th>id_30</th>\n",
       "      <th>id_31</th>\n",
       "      <th>id_32</th>\n",
       "      <th>id_33</th>\n",
       "      <th>id_34</th>\n",
       "      <th>id_35</th>\n",
       "      <th>id_36</th>\n",
       "      <th>id_37</th>\n",
       "      <th>id_38</th>\n",
       "      <th>DeviceType</th>\n",
       "      <th>DeviceInfo</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>TransactionID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2987004</th>\n",
       "      <td>0.0</td>\n",
       "      <td>70787.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>100.0</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-480.0</td>\n",
       "      <td>New</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>166.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>542.0</td>\n",
       "      <td>144.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>New</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>Android 7.0</td>\n",
       "      <td>samsung browser 6.2</td>\n",
       "      <td>32.0</td>\n",
       "      <td>2220x1080</td>\n",
       "      <td>match_status:2</td>\n",
       "      <td>T</td>\n",
       "      <td>F</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>mobile</td>\n",
       "      <td>SAMSUNG SM-G892A Build/NRD90M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987008</th>\n",
       "      <td>-5.0</td>\n",
       "      <td>98945.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-5.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>100.0</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>49.0</td>\n",
       "      <td>-300.0</td>\n",
       "      <td>New</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>166.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>621.0</td>\n",
       "      <td>500.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>New</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>iOS 11.1.2</td>\n",
       "      <td>mobile safari 11.0</td>\n",
       "      <td>32.0</td>\n",
       "      <td>1334x750</td>\n",
       "      <td>match_status:1</td>\n",
       "      <td>T</td>\n",
       "      <td>F</td>\n",
       "      <td>F</td>\n",
       "      <td>T</td>\n",
       "      <td>mobile</td>\n",
       "      <td>iOS Device</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987010</th>\n",
       "      <td>-5.0</td>\n",
       "      <td>191631.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>52.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Found</td>\n",
       "      <td>Found</td>\n",
       "      <td>121.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>410.0</td>\n",
       "      <td>142.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Found</td>\n",
       "      <td>Found</td>\n",
       "      <td>NaN</td>\n",
       "      <td>chrome 62.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>F</td>\n",
       "      <td>F</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>desktop</td>\n",
       "      <td>Windows</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987011</th>\n",
       "      <td>-5.0</td>\n",
       "      <td>221832.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>-6.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>100.0</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>52.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>New</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>225.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>176.0</td>\n",
       "      <td>507.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>New</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>NaN</td>\n",
       "      <td>chrome 62.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>F</td>\n",
       "      <td>F</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>desktop</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987016</th>\n",
       "      <td>0.0</td>\n",
       "      <td>7460.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>100.0</td>\n",
       "      <td>NotFound</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-300.0</td>\n",
       "      <td>Found</td>\n",
       "      <td>Found</td>\n",
       "      <td>166.0</td>\n",
       "      <td>15.0</td>\n",
       "      <td>529.0</td>\n",
       "      <td>575.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Found</td>\n",
       "      <td>Found</td>\n",
       "      <td>Mac OS X 10_11_6</td>\n",
       "      <td>chrome 62.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>1280x800</td>\n",
       "      <td>match_status:2</td>\n",
       "      <td>T</td>\n",
       "      <td>F</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>desktop</td>\n",
       "      <td>MacOS</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               id_01     id_02  id_03  id_04  id_05  id_06  id_07  id_08  \\\n",
       "TransactionID                                                              \n",
       "2987004          0.0   70787.0    NaN    NaN    NaN    NaN    NaN    NaN   \n",
       "2987008         -5.0   98945.0    NaN    NaN    0.0   -5.0    NaN    NaN   \n",
       "2987010         -5.0  191631.0    0.0    0.0    0.0    0.0    NaN    NaN   \n",
       "2987011         -5.0  221832.0    NaN    NaN    0.0   -6.0    NaN    NaN   \n",
       "2987016          0.0    7460.0    0.0    0.0    1.0    0.0    NaN    NaN   \n",
       "\n",
       "               id_09  id_10  id_11     id_12  id_13  id_14  id_15     id_16  \\\n",
       "TransactionID                                                                 \n",
       "2987004          NaN    NaN  100.0  NotFound    NaN -480.0    New  NotFound   \n",
       "2987008          NaN    NaN  100.0  NotFound   49.0 -300.0    New  NotFound   \n",
       "2987010          0.0    0.0  100.0  NotFound   52.0    NaN  Found     Found   \n",
       "2987011          NaN    NaN  100.0  NotFound   52.0    NaN    New  NotFound   \n",
       "2987016          0.0    0.0  100.0  NotFound    NaN -300.0  Found     Found   \n",
       "\n",
       "               id_17  id_18  id_19  id_20  id_21  id_22 id_23  id_24  id_25  \\\n",
       "TransactionID                                                                 \n",
       "2987004        166.0    NaN  542.0  144.0    NaN    NaN   NaN    NaN    NaN   \n",
       "2987008        166.0    NaN  621.0  500.0    NaN    NaN   NaN    NaN    NaN   \n",
       "2987010        121.0    NaN  410.0  142.0    NaN    NaN   NaN    NaN    NaN   \n",
       "2987011        225.0    NaN  176.0  507.0    NaN    NaN   NaN    NaN    NaN   \n",
       "2987016        166.0   15.0  529.0  575.0    NaN    NaN   NaN    NaN    NaN   \n",
       "\n",
       "               id_26 id_27  id_28     id_29             id_30  \\\n",
       "TransactionID                                                   \n",
       "2987004          NaN   NaN    New  NotFound       Android 7.0   \n",
       "2987008          NaN   NaN    New  NotFound        iOS 11.1.2   \n",
       "2987010          NaN   NaN  Found     Found               NaN   \n",
       "2987011          NaN   NaN    New  NotFound               NaN   \n",
       "2987016          NaN   NaN  Found     Found  Mac OS X 10_11_6   \n",
       "\n",
       "                             id_31  id_32      id_33           id_34 id_35  \\\n",
       "TransactionID                                                                \n",
       "2987004        samsung browser 6.2   32.0  2220x1080  match_status:2     T   \n",
       "2987008         mobile safari 11.0   32.0   1334x750  match_status:1     T   \n",
       "2987010                chrome 62.0    NaN        NaN             NaN     F   \n",
       "2987011                chrome 62.0    NaN        NaN             NaN     F   \n",
       "2987016                chrome 62.0   24.0   1280x800  match_status:2     T   \n",
       "\n",
       "              id_36 id_37 id_38 DeviceType                     DeviceInfo  \n",
       "TransactionID                                                              \n",
       "2987004           F     T     T     mobile  SAMSUNG SM-G892A Build/NRD90M  \n",
       "2987008           F     F     T     mobile                     iOS Device  \n",
       "2987010           F     T     T    desktop                        Windows  \n",
       "2987011           F     T     T    desktop                            NaN  \n",
       "2987016           F     T     T    desktop                          MacOS  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_identity = pd.read_csv(train_identity_path, error_bad_lines=False, index_col=\"TransactionID\")\n",
    "train_identity.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['Android 7.0', 'iOS 11.1.2', nan, 'Mac OS X 10_11_6', 'Windows 10',\n",
       "       'Android', 'Linux', 'iOS 11.0.3', 'Mac OS X 10_7_5',\n",
       "       'Mac OS X 10_12_6', 'Mac OS X 10_13_1', 'iOS 11.1.0',\n",
       "       'Mac OS X 10_9_5', 'Windows 7', 'Windows 8.1', 'Mac', 'iOS 10.3.3',\n",
       "       'Mac OS X 10.12', 'Mac OS X 10_10_5', 'Mac OS X 10_11_5',\n",
       "       'iOS 9.3.5', 'Android 5.1.1', 'Android 7.1.1', 'Android 6.0',\n",
       "       'iOS 10.3.1', 'Mac OS X 10.9', 'iOS 11.1.1', 'Windows Vista',\n",
       "       'iOS 10.3.2', 'iOS 11.0.2', 'Mac OS X 10.11', 'Android 8.0.0',\n",
       "       'iOS 10.2.0', 'iOS 10.2.1', 'iOS 11.0.0', 'Mac OS X 10.10',\n",
       "       'Mac OS X 10_12_3', 'Mac OS X 10_12', 'Android 6.0.1', 'iOS',\n",
       "       'Mac OS X 10.13', 'Mac OS X 10_12_5', 'Mac OS X 10_8_5',\n",
       "       'iOS 11.0.1', 'iOS 10.0.2', 'Android 5.0.2', 'Windows XP',\n",
       "       'iOS 11.2.0', 'Mac OS X 10.6', 'Windows 8', 'Mac OS X 10_6_8',\n",
       "       'Mac OS X 10_11_4', 'Mac OS X 10_12_1', 'iOS 10.1.1',\n",
       "       'Mac OS X 10_11_3', 'Mac OS X 10_12_4', 'Mac OS X 10_13_2',\n",
       "       'Android 4.4.2', 'Mac OS X 10_12_2', 'Android 5.0', 'func',\n",
       "       'Android 7.1.2', 'Android 8.1.0', 'other', 'Mac OS X 10_13_3',\n",
       "       'iOS 11.2.1', 'iOS 11.2.5', 'Windows', 'iOS 11.2.2', 'iOS 11.3.0',\n",
       "       'iOS 11.2.6', 'Mac OS X 10_13_4', 'Mac OS X 10_13_5', 'iOS 11.4.0',\n",
       "       'iOS 11.3.1', 'iOS 11.4.1'], dtype=object)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#upon inspection it appears that feature 'id_30' is a good candidate for string parsing\n",
    "\n",
    "train_identity['id_30'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['samsung browser 6.2', 'mobile safari 11.0', 'chrome 62.0', nan,\n",
       "       'chrome 62.0 for android', 'edge 15.0', 'mobile safari generic',\n",
       "       'chrome 49.0', 'chrome 61.0', 'edge 16.0', 'safari generic',\n",
       "       'edge 14.0', 'chrome 56.0 for android', 'firefox 57.0',\n",
       "       'chrome 54.0 for android', 'mobile safari uiwebview', 'chrome',\n",
       "       'chrome 62.0 for ios', 'firefox', 'chrome 60.0 for android',\n",
       "       'mobile safari 10.0', 'chrome 61.0 for android',\n",
       "       'ie 11.0 for desktop', 'ie 11.0 for tablet', 'mobile safari 9.0',\n",
       "       'chrome generic', 'other', 'chrome 59.0 for android',\n",
       "       'firefox 56.0', 'android webview 4.0', 'chrome 55.0', 'opera 49.0',\n",
       "       'ie', 'chrome 55.0 for android', 'firefox 52.0',\n",
       "       'chrome 57.0 for android', 'chrome 56.0',\n",
       "       'chrome 46.0 for android', 'chrome 58.0', 'firefox 48.0',\n",
       "       'chrome 59.0', 'samsung browser 4.0', 'edge 13.0',\n",
       "       'chrome 53.0 for android', 'chrome 58.0 for android',\n",
       "       'chrome 60.0', 'mobile safari 8.0', 'firefox generic',\n",
       "       'Generic/Android 7.0', 'mobile', 'Samsung/SM-G532M',\n",
       "       'chrome 50.0 for android', 'chrome 51.0 for android',\n",
       "       'chrome 63.0', 'chrome 52.0 for android', 'chrome 51.0',\n",
       "       'firefox 55.0', 'edge', 'opera', 'chrome generic for android',\n",
       "       'aol', 'samsung browser 5.4', 'Samsung/SCH', 'silk', 'chrome 57.0',\n",
       "       'firefox 47.0', 'chrome 63.0 for android', 'Samsung/SM-G531H',\n",
       "       'chrome 43.0 for android', 'waterfox', 'Nokia/Lumia',\n",
       "       'chrome 63.0 for ios', 'puffin', 'Microsoft/Windows', 'cyberfox',\n",
       "       'Generic/Android', 'samsung', 'opera generic',\n",
       "       'chrome 49.0 for android', 'ZTE/Blade', 'safari',\n",
       "       'android browser 4.0', 'samsung browser 5.2', 'palemoon',\n",
       "       'maxthon', 'line', 'LG/K-200', 'iron', 'BLU/Dash', 'seamonkey',\n",
       "       'firefox 58.0', 'chrome 64.0 for android', 'chrome 64.0',\n",
       "       'firefox 59.0', 'chrome 64.0 for ios', 'M4Tel/M4', 'comodo',\n",
       "       'Lanix/Ilium', 'samsung browser generic', 'chromium', 'opera 51.0',\n",
       "       'Inco/Minion', 'samsung browser 7.0', 'Mozilla/Firefox',\n",
       "       'samsung browser 4.2', 'samsung browser 6.4', 'chrome 65.0',\n",
       "       'chrome 65.0 for android', 'chrome 65.0 for ios', 'Cherry',\n",
       "       'icedragon', 'android', 'edge 17.0', 'chrome 66.0',\n",
       "       'chrome 66.0 for android', 'safari 11.0', 'safari 9.0',\n",
       "       'safari 10.0', 'google', 'chrome 66.0 for ios',\n",
       "       'google search application 48.0', 'opera 52.0', 'firefox 60.0',\n",
       "       'opera 53.0', 'samsung browser 3.3',\n",
       "       'google search application 49.0', 'facebook',\n",
       "       'firefox mobile 61.0', 'chrome 67.0', 'chrome 69.0',\n",
       "       'chrome 67.0 for android'], dtype=object)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#as is feature 'id_31'\n",
    "\n",
    "train_identity['id_31'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>isFraud</th>\n",
       "      <th>TransactionDT</th>\n",
       "      <th>TransactionAmt</th>\n",
       "      <th>ProductCD</th>\n",
       "      <th>card1</th>\n",
       "      <th>card2</th>\n",
       "      <th>card3</th>\n",
       "      <th>card4</th>\n",
       "      <th>card5</th>\n",
       "      <th>card6</th>\n",
       "      <th>addr1</th>\n",
       "      <th>addr2</th>\n",
       "      <th>dist1</th>\n",
       "      <th>dist2</th>\n",
       "      <th>P_emaildomain</th>\n",
       "      <th>R_emaildomain</th>\n",
       "      <th>C1</th>\n",
       "      <th>C2</th>\n",
       "      <th>C3</th>\n",
       "      <th>C4</th>\n",
       "      <th>C5</th>\n",
       "      <th>C6</th>\n",
       "      <th>C7</th>\n",
       "      <th>C8</th>\n",
       "      <th>C9</th>\n",
       "      <th>C10</th>\n",
       "      <th>C11</th>\n",
       "      <th>C12</th>\n",
       "      <th>C13</th>\n",
       "      <th>C14</th>\n",
       "      <th>D1</th>\n",
       "      <th>D2</th>\n",
       "      <th>D3</th>\n",
       "      <th>D4</th>\n",
       "      <th>D5</th>\n",
       "      <th>D6</th>\n",
       "      <th>D7</th>\n",
       "      <th>D8</th>\n",
       "      <th>D9</th>\n",
       "      <th>D10</th>\n",
       "      <th>D11</th>\n",
       "      <th>D12</th>\n",
       "      <th>D13</th>\n",
       "      <th>D14</th>\n",
       "      <th>D15</th>\n",
       "      <th>M1</th>\n",
       "      <th>M2</th>\n",
       "      <th>M3</th>\n",
       "      <th>M4</th>\n",
       "      <th>M5</th>\n",
       "      <th>M6</th>\n",
       "      <th>M7</th>\n",
       "      <th>M8</th>\n",
       "      <th>M9</th>\n",
       "      <th>V1</th>\n",
       "      <th>V2</th>\n",
       "      <th>V3</th>\n",
       "      <th>V4</th>\n",
       "      <th>V5</th>\n",
       "      <th>V6</th>\n",
       "      <th>V7</th>\n",
       "      <th>V8</th>\n",
       "      <th>V9</th>\n",
       "      <th>V10</th>\n",
       "      <th>V11</th>\n",
       "      <th>V12</th>\n",
       "      <th>V13</th>\n",
       "      <th>V14</th>\n",
       "      <th>V15</th>\n",
       "      <th>V16</th>\n",
       "      <th>V17</th>\n",
       "      <th>V18</th>\n",
       "      <th>V19</th>\n",
       "      <th>V20</th>\n",
       "      <th>V21</th>\n",
       "      <th>V22</th>\n",
       "      <th>V23</th>\n",
       "      <th>V24</th>\n",
       "      <th>V25</th>\n",
       "      <th>V26</th>\n",
       "      <th>V27</th>\n",
       "      <th>V28</th>\n",
       "      <th>V29</th>\n",
       "      <th>V30</th>\n",
       "      <th>V31</th>\n",
       "      <th>V32</th>\n",
       "      <th>V33</th>\n",
       "      <th>V34</th>\n",
       "      <th>V35</th>\n",
       "      <th>V36</th>\n",
       "      <th>V37</th>\n",
       "      <th>V38</th>\n",
       "      <th>V39</th>\n",
       "      <th>V40</th>\n",
       "      <th>V41</th>\n",
       "      <th>V42</th>\n",
       "      <th>V43</th>\n",
       "      <th>V44</th>\n",
       "      <th>V45</th>\n",
       "      <th>V46</th>\n",
       "      <th>...</th>\n",
       "      <th>V240</th>\n",
       "      <th>V241</th>\n",
       "      <th>V242</th>\n",
       "      <th>V243</th>\n",
       "      <th>V244</th>\n",
       "      <th>V245</th>\n",
       "      <th>V246</th>\n",
       "      <th>V247</th>\n",
       "      <th>V248</th>\n",
       "      <th>V249</th>\n",
       "      <th>V250</th>\n",
       "      <th>V251</th>\n",
       "      <th>V252</th>\n",
       "      <th>V253</th>\n",
       "      <th>V254</th>\n",
       "      <th>V255</th>\n",
       "      <th>V256</th>\n",
       "      <th>V257</th>\n",
       "      <th>V258</th>\n",
       "      <th>V259</th>\n",
       "      <th>V260</th>\n",
       "      <th>V261</th>\n",
       "      <th>V262</th>\n",
       "      <th>V263</th>\n",
       "      <th>V264</th>\n",
       "      <th>V265</th>\n",
       "      <th>V266</th>\n",
       "      <th>V267</th>\n",
       "      <th>V268</th>\n",
       "      <th>V269</th>\n",
       "      <th>V270</th>\n",
       "      <th>V271</th>\n",
       "      <th>V272</th>\n",
       "      <th>V273</th>\n",
       "      <th>V274</th>\n",
       "      <th>V275</th>\n",
       "      <th>V276</th>\n",
       "      <th>V277</th>\n",
       "      <th>V278</th>\n",
       "      <th>V279</th>\n",
       "      <th>V280</th>\n",
       "      <th>V281</th>\n",
       "      <th>V282</th>\n",
       "      <th>V283</th>\n",
       "      <th>V284</th>\n",
       "      <th>V285</th>\n",
       "      <th>V286</th>\n",
       "      <th>V287</th>\n",
       "      <th>V288</th>\n",
       "      <th>V289</th>\n",
       "      <th>V290</th>\n",
       "      <th>V291</th>\n",
       "      <th>V292</th>\n",
       "      <th>V293</th>\n",
       "      <th>V294</th>\n",
       "      <th>V295</th>\n",
       "      <th>V296</th>\n",
       "      <th>V297</th>\n",
       "      <th>V298</th>\n",
       "      <th>V299</th>\n",
       "      <th>V300</th>\n",
       "      <th>V301</th>\n",
       "      <th>V302</th>\n",
       "      <th>V303</th>\n",
       "      <th>V304</th>\n",
       "      <th>V305</th>\n",
       "      <th>V306</th>\n",
       "      <th>V307</th>\n",
       "      <th>V308</th>\n",
       "      <th>V309</th>\n",
       "      <th>V310</th>\n",
       "      <th>V311</th>\n",
       "      <th>V312</th>\n",
       "      <th>V313</th>\n",
       "      <th>V314</th>\n",
       "      <th>V315</th>\n",
       "      <th>V316</th>\n",
       "      <th>V317</th>\n",
       "      <th>V318</th>\n",
       "      <th>V319</th>\n",
       "      <th>V320</th>\n",
       "      <th>V321</th>\n",
       "      <th>V322</th>\n",
       "      <th>V323</th>\n",
       "      <th>V324</th>\n",
       "      <th>V325</th>\n",
       "      <th>V326</th>\n",
       "      <th>V327</th>\n",
       "      <th>V328</th>\n",
       "      <th>V329</th>\n",
       "      <th>V330</th>\n",
       "      <th>V331</th>\n",
       "      <th>V332</th>\n",
       "      <th>V333</th>\n",
       "      <th>V334</th>\n",
       "      <th>V335</th>\n",
       "      <th>V336</th>\n",
       "      <th>V337</th>\n",
       "      <th>V338</th>\n",
       "      <th>V339</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>TransactionID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2987000</th>\n",
       "      <td>0</td>\n",
       "      <td>86400</td>\n",
       "      <td>68.5</td>\n",
       "      <td>W</td>\n",
       "      <td>13926</td>\n",
       "      <td>NaN</td>\n",
       "      <td>150.0</td>\n",
       "      <td>discover</td>\n",
       "      <td>142.0</td>\n",
       "      <td>credit</td>\n",
       "      <td>315.0</td>\n",
       "      <td>87.0</td>\n",
       "      <td>19.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>14.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>13.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>13.0</td>\n",
       "      <td>13.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>M2</td>\n",
       "      <td>F</td>\n",
       "      <td>T</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>117.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>117.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987001</th>\n",
       "      <td>0</td>\n",
       "      <td>86401</td>\n",
       "      <td>29.0</td>\n",
       "      <td>W</td>\n",
       "      <td>2755</td>\n",
       "      <td>404.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>mastercard</td>\n",
       "      <td>102.0</td>\n",
       "      <td>credit</td>\n",
       "      <td>325.0</td>\n",
       "      <td>87.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>gmail.com</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>M0</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987002</th>\n",
       "      <td>0</td>\n",
       "      <td>86469</td>\n",
       "      <td>59.0</td>\n",
       "      <td>W</td>\n",
       "      <td>4663</td>\n",
       "      <td>490.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>visa</td>\n",
       "      <td>166.0</td>\n",
       "      <td>debit</td>\n",
       "      <td>330.0</td>\n",
       "      <td>87.0</td>\n",
       "      <td>287.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>outlook.com</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>315.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>315.0</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>M0</td>\n",
       "      <td>F</td>\n",
       "      <td>F</td>\n",
       "      <td>F</td>\n",
       "      <td>F</td>\n",
       "      <td>F</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987003</th>\n",
       "      <td>0</td>\n",
       "      <td>86499</td>\n",
       "      <td>50.0</td>\n",
       "      <td>W</td>\n",
       "      <td>18132</td>\n",
       "      <td>567.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>mastercard</td>\n",
       "      <td>117.0</td>\n",
       "      <td>debit</td>\n",
       "      <td>476.0</td>\n",
       "      <td>87.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>yahoo.com</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>25.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>112.0</td>\n",
       "      <td>112.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>94.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>84.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>111.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>M0</td>\n",
       "      <td>T</td>\n",
       "      <td>F</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>28.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>10.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>38.0</td>\n",
       "      <td>24.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>50.0</td>\n",
       "      <td>1758.0</td>\n",
       "      <td>925.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>354.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>135.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>50.0</td>\n",
       "      <td>1404.0</td>\n",
       "      <td>790.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987004</th>\n",
       "      <td>0</td>\n",
       "      <td>86506</td>\n",
       "      <td>50.0</td>\n",
       "      <td>H</td>\n",
       "      <td>4497</td>\n",
       "      <td>514.0</td>\n",
       "      <td>150.0</td>\n",
       "      <td>mastercard</td>\n",
       "      <td>102.0</td>\n",
       "      <td>credit</td>\n",
       "      <td>420.0</td>\n",
       "      <td>87.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>gmail.com</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 393 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "               isFraud  TransactionDT  TransactionAmt ProductCD  card1  card2  \\\n",
       "TransactionID                                                                   \n",
       "2987000              0          86400            68.5         W  13926    NaN   \n",
       "2987001              0          86401            29.0         W   2755  404.0   \n",
       "2987002              0          86469            59.0         W   4663  490.0   \n",
       "2987003              0          86499            50.0         W  18132  567.0   \n",
       "2987004              0          86506            50.0         H   4497  514.0   \n",
       "\n",
       "               card3       card4  card5   card6  addr1  addr2  dist1  dist2  \\\n",
       "TransactionID                                                                 \n",
       "2987000        150.0    discover  142.0  credit  315.0   87.0   19.0    NaN   \n",
       "2987001        150.0  mastercard  102.0  credit  325.0   87.0    NaN    NaN   \n",
       "2987002        150.0        visa  166.0   debit  330.0   87.0  287.0    NaN   \n",
       "2987003        150.0  mastercard  117.0   debit  476.0   87.0    NaN    NaN   \n",
       "2987004        150.0  mastercard  102.0  credit  420.0   87.0    NaN    NaN   \n",
       "\n",
       "              P_emaildomain R_emaildomain   C1   C2   C3   C4   C5   C6   C7  \\\n",
       "TransactionID                                                                  \n",
       "2987000                 NaN           NaN  1.0  1.0  0.0  0.0  0.0  1.0  0.0   \n",
       "2987001           gmail.com           NaN  1.0  1.0  0.0  0.0  0.0  1.0  0.0   \n",
       "2987002         outlook.com           NaN  1.0  1.0  0.0  0.0  0.0  1.0  0.0   \n",
       "2987003           yahoo.com           NaN  2.0  5.0  0.0  0.0  0.0  4.0  0.0   \n",
       "2987004           gmail.com           NaN  1.0  1.0  0.0  0.0  0.0  1.0  0.0   \n",
       "\n",
       "                C8   C9  C10  C11  C12   C13  C14     D1     D2    D3    D4  \\\n",
       "TransactionID                                                                 \n",
       "2987000        0.0  1.0  0.0  2.0  0.0   1.0  1.0   14.0    NaN  13.0   NaN   \n",
       "2987001        0.0  0.0  0.0  1.0  0.0   1.0  1.0    0.0    NaN   NaN   0.0   \n",
       "2987002        0.0  1.0  0.0  1.0  0.0   1.0  1.0    0.0    NaN   NaN   0.0   \n",
       "2987003        0.0  1.0  0.0  1.0  0.0  25.0  1.0  112.0  112.0   0.0  94.0   \n",
       "2987004        1.0  0.0  1.0  1.0  0.0   1.0  1.0    0.0    NaN   NaN   NaN   \n",
       "\n",
       "                D5  D6  D7  D8  D9   D10    D11  D12  D13  D14    D15   M1  \\\n",
       "TransactionID                                                                \n",
       "2987000        NaN NaN NaN NaN NaN  13.0   13.0  NaN  NaN  NaN    0.0    T   \n",
       "2987001        NaN NaN NaN NaN NaN   0.0    NaN  NaN  NaN  NaN    0.0  NaN   \n",
       "2987002        NaN NaN NaN NaN NaN   0.0  315.0  NaN  NaN  NaN  315.0    T   \n",
       "2987003        0.0 NaN NaN NaN NaN  84.0    NaN  NaN  NaN  NaN  111.0  NaN   \n",
       "2987004        NaN NaN NaN NaN NaN   NaN    NaN  NaN  NaN  NaN    NaN  NaN   \n",
       "\n",
       "                M2   M3   M4   M5   M6   M7   M8   M9   V1   V2   V3   V4  \\\n",
       "TransactionID                                                               \n",
       "2987000          T    T   M2    F    T  NaN  NaN  NaN  1.0  1.0  1.0  1.0   \n",
       "2987001        NaN  NaN   M0    T    T  NaN  NaN  NaN  NaN  NaN  NaN  NaN   \n",
       "2987002          T    T   M0    F    F    F    F    F  1.0  1.0  1.0  1.0   \n",
       "2987003        NaN  NaN   M0    T    F  NaN  NaN  NaN  NaN  NaN  NaN  NaN   \n",
       "2987004        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   \n",
       "\n",
       "                V5   V6   V7   V8   V9  V10  V11  V12  V13  V14  V15  V16  \\\n",
       "TransactionID                                                               \n",
       "2987000        1.0  1.0  1.0  1.0  1.0  0.0  0.0  1.0  1.0  1.0  0.0  0.0   \n",
       "2987001        NaN  NaN  NaN  NaN  NaN  NaN  NaN  0.0  0.0  1.0  0.0  0.0   \n",
       "2987002        1.0  1.0  1.0  1.0  1.0  0.0  0.0  1.0  1.0  1.0  0.0  0.0   \n",
       "2987003        NaN  NaN  NaN  NaN  NaN  NaN  NaN  1.0  1.0  1.0  0.0  0.0   \n",
       "2987004        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   \n",
       "\n",
       "               V17  V18  V19  V20  V21  V22  V23  V24  V25  V26  V27  V28  \\\n",
       "TransactionID                                                               \n",
       "2987000        0.0  0.0  1.0  1.0  0.0  0.0  1.0  1.0  1.0  1.0  0.0  0.0   \n",
       "2987001        0.0  0.0  1.0  1.0  0.0  0.0  1.0  1.0  1.0  1.0  0.0  0.0   \n",
       "2987002        0.0  0.0  1.0  1.0  0.0  0.0  1.0  1.0  1.0  1.0  0.0  0.0   \n",
       "2987003        0.0  0.0  1.0  1.0  0.0  0.0  1.0  1.0  1.0  1.0  0.0  0.0   \n",
       "2987004        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   \n",
       "\n",
       "               V29  V30  V31  V32  V33  V34  V35  V36  V37  V38  V39  V40  \\\n",
       "TransactionID                                                               \n",
       "2987000        0.0  0.0  0.0  0.0  0.0  0.0  NaN  NaN  NaN  NaN  NaN  NaN   \n",
       "2987001        0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0  1.0  0.0  0.0   \n",
       "2987002        0.0  0.0  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0  0.0  0.0   \n",
       "2987003        0.0  0.0  0.0  0.0  0.0  0.0  1.0  1.0  1.0  1.0  0.0  0.0   \n",
       "2987004        NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN   \n",
       "\n",
       "               V41  V42  V43  V44  V45  V46  ...  V240  V241  V242  V243  \\\n",
       "TransactionID                                ...                           \n",
       "2987000        NaN  NaN  NaN  NaN  NaN  NaN  ...   NaN   NaN   NaN   NaN   \n",
       "2987001        1.0  0.0  0.0  1.0  1.0  1.0  ...   NaN   NaN   NaN   NaN   \n",
       "2987002        1.0  0.0  0.0  1.0  1.0  1.0  ...   NaN   NaN   NaN   NaN   \n",
       "2987003        1.0  0.0  0.0  1.0  1.0  1.0  ...   NaN   NaN   NaN   NaN   \n",
       "2987004        NaN  NaN  NaN  NaN  NaN  NaN  ...   1.0   1.0   1.0   1.0   \n",
       "\n",
       "               V244  V245  V246  V247  V248  V249  V250  V251  V252  V253  \\\n",
       "TransactionID                                                               \n",
       "2987000         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987001         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987002         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987003         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987004         1.0   1.0   1.0   1.0   1.0   1.0   1.0   1.0   1.0   1.0   \n",
       "\n",
       "               V254  V255  V256  V257  V258  V259  V260  V261  V262  V263  \\\n",
       "TransactionID                                                               \n",
       "2987000         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987001         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987002         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987003         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987004         1.0   1.0   1.0   1.0   1.0   1.0   1.0   1.0   1.0   0.0   \n",
       "\n",
       "               V264  V265  V266  V267  V268  V269  V270  V271  V272  V273  \\\n",
       "TransactionID                                                               \n",
       "2987000         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987001         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987002         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987003         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987004         0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "\n",
       "               V274  V275  V276  V277  V278  V279  V280  V281  V282  V283  \\\n",
       "TransactionID                                                               \n",
       "2987000         NaN   NaN   NaN   NaN   NaN   0.0   0.0   0.0   1.0   1.0   \n",
       "2987001         NaN   NaN   NaN   NaN   NaN   0.0   0.0   0.0   1.0   1.0   \n",
       "2987002         NaN   NaN   NaN   NaN   NaN   0.0   0.0   0.0   1.0   1.0   \n",
       "2987003         NaN   NaN   NaN   NaN   NaN   1.0  28.0   0.0   0.0   0.0   \n",
       "2987004         0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   1.0   1.0   \n",
       "\n",
       "               V284  V285  V286  V287  V288  V289  V290  V291  V292  V293  \\\n",
       "TransactionID                                                               \n",
       "2987000         0.0   0.0   0.0   0.0   0.0   0.0   1.0   1.0   1.0   0.0   \n",
       "2987001         0.0   0.0   0.0   0.0   0.0   0.0   1.0   1.0   1.0   0.0   \n",
       "2987002         0.0   0.0   0.0   0.0   0.0   0.0   1.0   1.0   1.0   0.0   \n",
       "2987003         0.0  10.0   0.0   4.0   0.0   0.0   1.0   1.0   1.0   1.0   \n",
       "2987004         0.0   0.0   0.0   0.0   0.0   0.0   1.0   1.0   1.0   0.0   \n",
       "\n",
       "               V294  V295  V296  V297  V298  V299  V300  V301  V302  V303  \\\n",
       "TransactionID                                                               \n",
       "2987000         1.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "2987001         0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "2987002         0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "2987003        38.0  24.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "2987004         0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   1.0   1.0   \n",
       "\n",
       "               V304  V305  V306    V307   V308  V309   V310  V311   V312  \\\n",
       "TransactionID                                                              \n",
       "2987000         0.0   1.0   0.0   117.0    0.0   0.0    0.0   0.0    0.0   \n",
       "2987001         0.0   1.0   0.0     0.0    0.0   0.0    0.0   0.0    0.0   \n",
       "2987002         0.0   1.0   0.0     0.0    0.0   0.0    0.0   0.0    0.0   \n",
       "2987003         0.0   1.0  50.0  1758.0  925.0   0.0  354.0   0.0  135.0   \n",
       "2987004         1.0   1.0   0.0     0.0    0.0   0.0    0.0   0.0    0.0   \n",
       "\n",
       "               V313  V314  V315  V316    V317   V318  V319  V320  V321  V322  \\\n",
       "TransactionID                                                                  \n",
       "2987000         0.0   0.0   0.0   0.0   117.0    0.0   0.0   0.0   0.0   NaN   \n",
       "2987001         0.0   0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0   NaN   \n",
       "2987002         0.0   0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0   NaN   \n",
       "2987003         0.0   0.0   0.0  50.0  1404.0  790.0   0.0   0.0   0.0   NaN   \n",
       "2987004         0.0   0.0   0.0   0.0     0.0    0.0   0.0   0.0   0.0   0.0   \n",
       "\n",
       "               V323  V324  V325  V326  V327  V328  V329  V330  V331  V332  \\\n",
       "TransactionID                                                               \n",
       "2987000         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987001         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987002         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987003         NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   \n",
       "2987004         0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "\n",
       "               V333  V334  V335  V336  V337  V338  V339  \n",
       "TransactionID                                            \n",
       "2987000         NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "2987001         NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "2987002         NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "2987003         NaN   NaN   NaN   NaN   NaN   NaN   NaN  \n",
       "2987004         0.0   0.0   0.0   0.0   0.0   0.0   0.0  \n",
       "\n",
       "[5 rows x 393 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#the labels are found in the transaction set\n",
    "\n",
    "train_transaction = pd.read_csv(train_transaction_path, error_bad_lines=False, index_col=\"TransactionID\")\n",
    "train_transaction.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "#so we need to concatinate identity and transaction sets based on TransactionID which we set as our index column\n",
    "\n",
    "train_identity = pd.read_csv(train_identity_path, error_bad_lines=False, index_col=\"TransactionID\")\n",
    "train_transaction = pd.read_csv(train_transaction_path, error_bad_lines=False, index_col=\"TransactionID\")\n",
    "#test_identity = pd.read_csv(test_identity_path, error_bad_lines=False, index_col=\"TransactionID\")\n",
    "#test_transaction = pd.read_csv(test_transaction_path, error_bad_lines=False, index_col=\"TransactionID\")\n",
    "\n",
    "df_train = pd.concat([train_transaction, train_identity], axis=1, sort=False)\n",
    "#df_test = pd.concat([test_transaction, test_identity], axis=1, sort=False)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(590540, 433)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>card6</th>\n",
       "      <th>C13</th>\n",
       "      <th>C1</th>\n",
       "      <th>C14</th>\n",
       "      <th>V317</th>\n",
       "      <th>V318</th>\n",
       "      <th>P_emaildomain</th>\n",
       "      <th>TransactionAmt</th>\n",
       "      <th>C11</th>\n",
       "      <th>TransactionDT</th>\n",
       "      <th>id_30</th>\n",
       "      <th>id_31</th>\n",
       "      <th>isFraud</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>TransactionID</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2987000</th>\n",
       "      <td>credit</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>117.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>68.5</td>\n",
       "      <td>2.0</td>\n",
       "      <td>86400</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987001</th>\n",
       "      <td>credit</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>gmail.com</td>\n",
       "      <td>29.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>86401</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987002</th>\n",
       "      <td>debit</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>outlook.com</td>\n",
       "      <td>59.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>86469</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987003</th>\n",
       "      <td>debit</td>\n",
       "      <td>25.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1404.0</td>\n",
       "      <td>790.0</td>\n",
       "      <td>yahoo.com</td>\n",
       "      <td>50.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>86499</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2987004</th>\n",
       "      <td>credit</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>gmail.com</td>\n",
       "      <td>50.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>86506</td>\n",
       "      <td>Android 7.0</td>\n",
       "      <td>samsung browser 6.2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                card6   C13   C1  C14    V317   V318 P_emaildomain  \\\n",
       "TransactionID                                                        \n",
       "2987000        credit   1.0  1.0  1.0   117.0    0.0           NaN   \n",
       "2987001        credit   1.0  1.0  1.0     0.0    0.0     gmail.com   \n",
       "2987002         debit   1.0  1.0  1.0     0.0    0.0   outlook.com   \n",
       "2987003         debit  25.0  2.0  1.0  1404.0  790.0     yahoo.com   \n",
       "2987004        credit   1.0  1.0  1.0     0.0    0.0     gmail.com   \n",
       "\n",
       "               TransactionAmt  C11  TransactionDT        id_30  \\\n",
       "TransactionID                                                    \n",
       "2987000                  68.5  2.0          86400          NaN   \n",
       "2987001                  29.0  1.0          86401          NaN   \n",
       "2987002                  59.0  1.0          86469          NaN   \n",
       "2987003                  50.0  1.0          86499          NaN   \n",
       "2987004                  50.0  1.0          86506  Android 7.0   \n",
       "\n",
       "                             id_31  isFraud  \n",
       "TransactionID                                \n",
       "2987000                        NaN        0  \n",
       "2987001                        NaN        0  \n",
       "2987002                        NaN        0  \n",
       "2987003                        NaN        0  \n",
       "2987004        samsung browser 6.2        0  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#as a contrivance to make influence of our string parse target features more prominant\n",
    "#we'll only base our evaluation on the two target features from identity set \n",
    "#and top ten features from transaction set\n",
    "#so will drop remaining features other than the labels\n",
    "#we derived this top ten list from a feature importance evaluation by automunge(.) not shown\n",
    "\n",
    "topten = ['card6', 'C13', 'C1', 'C14', 'V317', \\\n",
    "          'V318', 'P_emaildomain', 'TransactionAmt', 'C11', 'TransactionDT']\n",
    "\n",
    "targets_for_stringparse = ['id_30', 'id_31']\n",
    "\n",
    "labels = ['isFraud']\n",
    "\n",
    "retainedcolumns = topten + targets_for_stringparse + labels\n",
    "\n",
    "df_train = df_train[retainedcolumns]\n",
    "\n",
    "df_train.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scenario 1\n",
    "\n",
    "## Full automation, no string parsing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_______________\n",
      "Begin Feature Importance evaluation\n",
      "\n",
      "_______________\n",
      "Begin Automunge processing\n",
      "\n",
      "evaluating column:  card6\n",
      "processing column:  card6\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['card6_1010_0', 'card6_1010_1', 'card6_1010_2']\n",
      "\n",
      "evaluating column:  C13\n",
      "processing column:  C13\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C13_nmbr']\n",
      "\n",
      "evaluating column:  C1\n",
      "processing column:  C1\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C1_nmbr']\n",
      "\n",
      "evaluating column:  C14\n",
      "processing column:  C14\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C14_nmbr']\n",
      "\n",
      "evaluating column:  V317\n",
      "processing column:  V317\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['V317_nmbr']\n",
      "\n",
      "evaluating column:  V318\n",
      "processing column:  V318\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['V318_nmbr']\n",
      "\n",
      "evaluating column:  P_emaildomain\n",
      "processing column:  P_emaildomain\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5']\n",
      "\n",
      "evaluating column:  TransactionAmt\n",
      "processing column:  TransactionAmt\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['TransactionAmt_nmbr']\n",
      "\n",
      "evaluating column:  C11\n",
      "processing column:  C11\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C11_nmbr']\n",
      "\n",
      "evaluating column:  TransactionDT\n",
      "processing column:  TransactionDT\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['TransactionDT_nmbr']\n",
      "\n",
      "evaluating column:  id_30\n",
      "processing column:  id_30\n",
      "    root category:  ord3\n",
      " returned columns:\n",
      "['id_30_ord3']\n",
      "\n",
      "evaluating column:  id_31\n",
      "processing column:  id_31\n",
      "    root category:  ord3\n",
      " returned columns:\n",
      "['id_31_ord3']\n",
      "\n",
      "______\n",
      "\n",
      "evaluating label column:  isFraud\n",
      "processing label column:  isFraud\n",
      "    root label category:  lbbn\n",
      "\n",
      " returned columns:\n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  C13_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C1_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C14_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V317_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V318_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionAmt_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C11_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionDT_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "_______________\n",
      "Begin Validation set processing with Postmunge\n",
      "\n",
      "_______________\n",
      "Begin Postmunge processing\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  card6\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['card6_1010_0', 'card6_1010_1', 'card6_1010_2']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C13\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C13_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C1\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C1_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C14\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C14_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  V317\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['V317_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  V318\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['V318_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  P_emaildomain\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  TransactionAmt\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['TransactionAmt_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C11\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C11_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  TransactionDT\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['TransactionDT_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  id_30\n",
      "    root category:  ord3\n",
      "\n",
      " returned columns:\n",
      "['id_30_ord3']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  id_31\n",
      "    root category:  ord3\n",
      "\n",
      " returned columns:\n",
      "['id_31_ord3']\n",
      "\n",
      "______\n",
      "\n",
      "processing label column:  isFraud\n",
      "    root label category:  lbbn\n",
      "\n",
      " returned columns:\n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  C13_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C1_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C14_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V317_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V318_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionAmt_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C11_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionDT_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "_______________\n",
      "Postmunge returned ID column set: \n",
      "['Automunge_index_744588193929']\n",
      "\n",
      "Postmunge returned test column set: \n",
      "['C13_nmbr', 'C1_nmbr', 'C14_nmbr', 'V317_nmbr', 'V318_nmbr', 'TransactionAmt_nmbr', 'C11_nmbr', 'TransactionDT_nmbr', 'id_30_ord3', 'id_31_ord3', 'card6_1010_0', 'card6_1010_1', 'card6_1010_2', 'P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5']\n",
      "\n",
      "Postmunge returned label column set: \n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "_______________\n",
      "Postmunge Complete\n",
      "\n",
      "______\n",
      "\n",
      "versioning serial stamp:\n",
      "_5.22_744588193929_2020-11-12T19:04:29.684302\n",
      "\n",
      "Automunge returned ID column set: \n",
      "['TransactionID', 'Automunge_index_744588193929']\n",
      "\n",
      "Automunge returned train column set: \n",
      "['C13_nmbr', 'C1_nmbr', 'C14_nmbr', 'V317_nmbr', 'V318_nmbr', 'TransactionAmt_nmbr', 'C11_nmbr', 'TransactionDT_nmbr', 'id_30_ord3', 'id_31_ord3', 'card6_1010_0', 'card6_1010_1', 'card6_1010_2', 'P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5']\n",
      "\n",
      "Automunge returned label column set: \n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "_______________\n",
      "Automunge Complete\n",
      "\n",
      "_______________\n",
      "Training feature importance evaluation model\n",
      "\n",
      "Base Accuracy of feature importance model:\n",
      "0.9803992955599959\n",
      "\n",
      "_______________\n",
      "Evaluating feature importances\n",
      "\n",
      "_______________\n",
      "Feature Importance results:\n",
      "\n",
      "C13_nmbr\n",
      "metric =  0.02110779964100651\n",
      "metric2 =  0.0\n",
      "\n",
      "C1_nmbr\n",
      "metric =  0.047888373353202174\n",
      "metric2 =  0.0\n",
      "\n",
      "C14_nmbr\n",
      "metric =  0.00947437938158291\n",
      "metric2 =  0.0\n",
      "\n",
      "V317_nmbr\n",
      "metric =  0.002717851457987641\n",
      "metric2 =  0.0\n",
      "\n",
      "V318_nmbr\n",
      "metric =  0.0018034341450198221\n",
      "metric2 =  0.0\n",
      "\n",
      "TransactionAmt_nmbr\n",
      "metric =  0.00438581637145663\n",
      "metric2 =  0.0\n",
      "\n",
      "C11_nmbr\n",
      "metric =  0.009161106783621809\n",
      "metric2 =  0.0\n",
      "\n",
      "TransactionDT_nmbr\n",
      "metric =  0.0050546957022385985\n",
      "metric2 =  0.0\n",
      "\n",
      "id_30_ord3\n",
      "metric =  0.0019304365495986797\n",
      "metric2 =  0.0\n",
      "\n",
      "id_31_ord3\n",
      "metric =  0.005808243302739879\n",
      "metric2 =  0.0\n",
      "\n",
      "card6_1010_0\n",
      "metric =  0.003547600501236148\n",
      "metric2 =  0.003547600501236148\n",
      "\n",
      "card6_1010_1\n",
      "metric =  0.003547600501236148\n",
      "metric2 =  0.0019558370305144956\n",
      "\n",
      "card6_1010_2\n",
      "metric =  0.003547600501236148\n",
      "metric2 =  0.0012022894300132148\n",
      "\n",
      "P_emaildomain_1010_0\n",
      "metric =  0.005808243302739879\n",
      "metric2 =  0.004826091373996699\n",
      "\n",
      "P_emaildomain_1010_1\n",
      "metric =  0.005808243302739879\n",
      "metric2 =  0.004936160124631717\n",
      "\n",
      "P_emaildomain_1010_2\n",
      "metric =  0.005808243302739879\n",
      "metric2 =  0.0050800961831544145\n",
      "\n",
      "P_emaildomain_1010_3\n",
      "metric =  0.005808243302739879\n",
      "metric2 =  0.005029295221322894\n",
      "\n",
      "P_emaildomain_1010_4\n",
      "metric =  0.005808243302739879\n",
      "metric2 =  0.004673688488502026\n",
      "\n",
      "P_emaildomain_1010_5\n",
      "metric =  0.005808243302739879\n",
      "metric2 =  0.004648288007586321\n",
      "\n",
      "\n",
      "______________________\n",
      "sorted metric results:\n",
      "\n",
      "C1\n",
      "0.047888373353202174\n",
      "\n",
      "C13\n",
      "0.02110779964100651\n",
      "\n",
      "C14\n",
      "0.00947437938158291\n",
      "\n",
      "C11\n",
      "0.009161106783621809\n",
      "\n",
      "P_emaildomain\n",
      "0.005808243302739879\n",
      "\n",
      "id_31\n",
      "0.005808243302739879\n",
      "\n",
      "TransactionDT\n",
      "0.0050546957022385985\n",
      "\n",
      "TransactionAmt\n",
      "0.00438581637145663\n",
      "\n",
      "card6\n",
      "0.003547600501236148\n",
      "\n",
      "V317\n",
      "0.002717851457987641\n",
      "\n",
      "id_30\n",
      "0.0019304365495986797\n",
      "\n",
      "V318\n",
      "0.0018034341450198221\n",
      "\n",
      "______________________\n",
      "sorted metric2 results:\n",
      "\n",
      "for source column:  card6\n",
      "card6_1010_2\n",
      "0.0012022894300132148\n",
      "\n",
      "card6_1010_1\n",
      "0.0019558370305144956\n",
      "\n",
      "card6_1010_0\n",
      "0.003547600501236148\n",
      "\n",
      "\n",
      "for source column:  C13\n",
      "C13_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  C1\n",
      "C1_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  C14\n",
      "C14_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  V317\n",
      "V317_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  V318\n",
      "V318_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  P_emaildomain\n",
      "P_emaildomain_1010_5\n",
      "0.004648288007586321\n",
      "\n",
      "P_emaildomain_1010_4\n",
      "0.004673688488502026\n",
      "\n",
      "P_emaildomain_1010_0\n",
      "0.004826091373996699\n",
      "\n",
      "P_emaildomain_1010_1\n",
      "0.004936160124631717\n",
      "\n",
      "P_emaildomain_1010_3\n",
      "0.005029295221322894\n",
      "\n",
      "P_emaildomain_1010_2\n",
      "0.0050800961831544145\n",
      "\n",
      "\n",
      "for source column:  TransactionAmt\n",
      "TransactionAmt_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  C11\n",
      "C11_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  TransactionDT\n",
      "TransactionDT_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  id_30\n",
      "id_30_ord3\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  id_31\n",
      "id_31_ord3\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  isFraud\n",
      "\n",
      "_______________\n",
      "Feature Importance evaluation complete\n",
      "\n",
      "_______________\n",
      "Feature Importance results returned\n",
      "\n",
      "_______________\n",
      "Automunge Complete\n",
      "\n"
     ]
    }
   ],
   "source": [
    "#now we'll run a base scenario under full automation\n",
    "#and use the model trained as part of feature importance \n",
    "#to measure the result\n",
    "\n",
    "#now we'll run featuremethod = 'report'\n",
    "#which will only return feature improtance results \n",
    "#to save time of processing data\n",
    "\n",
    "#and set the eval_ratio to .0001 to speed it up a little\n",
    "\n",
    "#we'll increase the n_estimators for the Random Forest call\n",
    "\n",
    "train, trainID, labels, \\\n",
    "validation1, validationID1, validationlabels1, \\\n",
    "validation2, validationID2, validationlabels2, \\\n",
    "test, testID, testlabels, \\\n",
    "labelsencoding_dict, finalcolumns_train, finalcolumns_test, \\\n",
    "featureimportance1, postprocess_dict1 = \\\n",
    "am.automunge(df_train, df_test = False, \\\n",
    "             labels_column = label_column, \\\n",
    "             randomseed = 42, eval_ratio = .0001, \\\n",
    "             pandasoutput = True, \\\n",
    "             featureselection = True, featuremethod = 'report', \\\n",
    "             ML_cmnd = {'autoML_type':'randomforest', \\\n",
    "                        'MLinfill_cmnd':{'RandomForestClassifier':{'n_estimators':222}}}, \\\n",
    "             assigncat = {}, \\\n",
    "             processdict = {}, transformdict = {}, \\\n",
    "             printstatus = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9803992955599959"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#the resulting accuracy is available in printouts and also here:\n",
    "postprocess_dict1['baseaccuracy']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0019304365495986797"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#we can view the feature importance metric results \n",
    "#derived by shuffle permutation\n",
    "#associated with these features\n",
    "#(where a higher score is more importance))\n",
    "postprocess_dict1['column_key']['id_30']\n",
    "\n",
    "#as an asterisk, note that in a full automunge(.) call\n",
    "#these results are returned in postprocess_dict1['FS_sorted']['column_key']['id_30']\n",
    "#and similarily for other demonstrated results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.005808243302739879"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "postprocess_dict1['column_key']['id_31']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id_30_ord3': 0.0}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#we can also view the relative importance between columns derived from \n",
    "#these features (where a lower score is more importance)\n",
    "postprocess_dict1['metric2_column_key']['id_30']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id_31_ord3': 0.0}"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "postprocess_dict1['metric2_column_key']['id_31']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scenario 2\n",
    "\n",
    "## 'or19' string parsing applied to id_30, id_31"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_______________\n",
      "Begin Feature Importance evaluation\n",
      "\n",
      "_______________\n",
      "Begin Automunge processing\n",
      "\n",
      "evaluating column:  card6\n",
      "processing column:  card6\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['card6_1010_0', 'card6_1010_1', 'card6_1010_2']\n",
      "\n",
      "evaluating column:  C13\n",
      "processing column:  C13\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C13_nmbr']\n",
      "\n",
      "evaluating column:  C1\n",
      "processing column:  C1\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C1_nmbr']\n",
      "\n",
      "evaluating column:  C14\n",
      "processing column:  C14\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C14_nmbr']\n",
      "\n",
      "evaluating column:  V317\n",
      "processing column:  V317\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['V317_nmbr']\n",
      "\n",
      "evaluating column:  V318\n",
      "processing column:  V318\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['V318_nmbr']\n",
      "\n",
      "evaluating column:  P_emaildomain\n",
      "processing column:  P_emaildomain\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5']\n",
      "\n",
      "evaluating column:  TransactionAmt\n",
      "processing column:  TransactionAmt\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['TransactionAmt_nmbr']\n",
      "\n",
      "evaluating column:  C11\n",
      "processing column:  C11\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C11_nmbr']\n",
      "\n",
      "evaluating column:  TransactionDT\n",
      "processing column:  TransactionDT\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['TransactionDT_nmbr']\n",
      "\n",
      "evaluating column:  id_30\n",
      "processing column:  id_30\n",
      "    root category:  or19\n",
      " returned columns:\n",
      "['id_30_UPCS_spl9_ord3', 'id_30_UPCS_spl9_sp10_ord3', 'id_30_UPCS_nmc7_nmbr', 'id_30_UPCS_1010_0', 'id_30_UPCS_1010_1', 'id_30_UPCS_1010_2', 'id_30_UPCS_1010_3', 'id_30_UPCS_1010_4', 'id_30_UPCS_1010_5', 'id_30_UPCS_1010_6']\n",
      "\n",
      "evaluating column:  id_31\n",
      "processing column:  id_31\n",
      "    root category:  or19\n",
      " returned columns:\n",
      "['id_31_UPCS_spl9_ord3', 'id_31_UPCS_spl9_sp10_ord3', 'id_31_UPCS_nmc7_nmbr', 'id_31_UPCS_1010_0', 'id_31_UPCS_1010_1', 'id_31_UPCS_1010_2', 'id_31_UPCS_1010_3', 'id_31_UPCS_1010_4', 'id_31_UPCS_1010_5', 'id_31_UPCS_1010_6']\n",
      "\n",
      "______\n",
      "\n",
      "evaluating label column:  isFraud\n",
      "processing label column:  isFraud\n",
      "    root label category:  lbbn\n",
      "\n",
      " returned columns:\n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  C13_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C1_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C14_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V317_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V318_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionAmt_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C11_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionDT_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_spl9_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_spl9_sp10_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_nmc7_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_6\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_spl9_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_spl9_sp10_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_nmc7_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_6\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "_______________\n",
      "Begin Validation set processing with Postmunge\n",
      "\n",
      "_______________\n",
      "Begin Postmunge processing\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  card6\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['card6_1010_0', 'card6_1010_1', 'card6_1010_2']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C13\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C13_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C1\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C1_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C14\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C14_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  V317\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['V317_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  V318\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['V318_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  P_emaildomain\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  TransactionAmt\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['TransactionAmt_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C11\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C11_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  TransactionDT\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['TransactionDT_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  id_30\n",
      "    root category:  or19\n",
      "\n",
      " returned columns:\n",
      "['id_30_UPCS_spl9_ord3', 'id_30_UPCS_spl9_sp10_ord3', 'id_30_UPCS_nmc7_nmbr', 'id_30_UPCS_1010_0', 'id_30_UPCS_1010_1', 'id_30_UPCS_1010_2', 'id_30_UPCS_1010_3', 'id_30_UPCS_1010_4', 'id_30_UPCS_1010_5', 'id_30_UPCS_1010_6']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  id_31\n",
      "    root category:  or19\n",
      "\n",
      " returned columns:\n",
      "['id_31_UPCS_spl9_ord3', 'id_31_UPCS_spl9_sp10_ord3', 'id_31_UPCS_nmc7_nmbr', 'id_31_UPCS_1010_0', 'id_31_UPCS_1010_1', 'id_31_UPCS_1010_2', 'id_31_UPCS_1010_3', 'id_31_UPCS_1010_4', 'id_31_UPCS_1010_5', 'id_31_UPCS_1010_6']\n",
      "\n",
      "______\n",
      "\n",
      "processing label column:  isFraud\n",
      "    root label category:  lbbn\n",
      "\n",
      " returned columns:\n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  C13_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C1_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C14_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V317_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V318_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionAmt_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C11_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionDT_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_spl9_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_spl9_sp10_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_nmc7_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_UPCS_1010_6\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_spl9_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_spl9_sp10_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_nmc7_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_UPCS_1010_6\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "_______________\n",
      "Postmunge returned ID column set: \n",
      "['Automunge_index_515610163286']\n",
      "\n",
      "Postmunge returned test column set: \n",
      "['C13_nmbr', 'C1_nmbr', 'C14_nmbr', 'V317_nmbr', 'V318_nmbr', 'TransactionAmt_nmbr', 'C11_nmbr', 'TransactionDT_nmbr', 'card6_1010_0', 'card6_1010_1', 'card6_1010_2', 'P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5', 'id_30_UPCS_spl9_ord3', 'id_30_UPCS_spl9_sp10_ord3', 'id_30_UPCS_nmc7_nmbr', 'id_30_UPCS_1010_0', 'id_30_UPCS_1010_1', 'id_30_UPCS_1010_2', 'id_30_UPCS_1010_3', 'id_30_UPCS_1010_4', 'id_30_UPCS_1010_5', 'id_30_UPCS_1010_6', 'id_31_UPCS_spl9_ord3', 'id_31_UPCS_spl9_sp10_ord3', 'id_31_UPCS_nmc7_nmbr', 'id_31_UPCS_1010_0', 'id_31_UPCS_1010_1', 'id_31_UPCS_1010_2', 'id_31_UPCS_1010_3', 'id_31_UPCS_1010_4', 'id_31_UPCS_1010_5', 'id_31_UPCS_1010_6']\n",
      "\n",
      "Postmunge returned label column set: \n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "_______________\n",
      "Postmunge Complete\n",
      "\n",
      "______\n",
      "\n",
      "versioning serial stamp:\n",
      "_5.22_515610163286_2020-11-12T19:18:12.674556\n",
      "\n",
      "Automunge returned ID column set: \n",
      "['TransactionID', 'Automunge_index_515610163286']\n",
      "\n",
      "Automunge returned train column set: \n",
      "['C13_nmbr', 'C1_nmbr', 'C14_nmbr', 'V317_nmbr', 'V318_nmbr', 'TransactionAmt_nmbr', 'C11_nmbr', 'TransactionDT_nmbr', 'card6_1010_0', 'card6_1010_1', 'card6_1010_2', 'P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5', 'id_30_UPCS_spl9_ord3', 'id_30_UPCS_spl9_sp10_ord3', 'id_30_UPCS_nmc7_nmbr', 'id_30_UPCS_1010_0', 'id_30_UPCS_1010_1', 'id_30_UPCS_1010_2', 'id_30_UPCS_1010_3', 'id_30_UPCS_1010_4', 'id_30_UPCS_1010_5', 'id_30_UPCS_1010_6', 'id_31_UPCS_spl9_ord3', 'id_31_UPCS_spl9_sp10_ord3', 'id_31_UPCS_nmc7_nmbr', 'id_31_UPCS_1010_0', 'id_31_UPCS_1010_1', 'id_31_UPCS_1010_2', 'id_31_UPCS_1010_3', 'id_31_UPCS_1010_4', 'id_31_UPCS_1010_5', 'id_31_UPCS_1010_6']\n",
      "\n",
      "Automunge returned label column set: \n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "_______________\n",
      "Automunge Complete\n",
      "\n",
      "_______________\n",
      "Training feature importance evaluation model\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Base Accuracy of feature importance model:\n",
      "0.9808226369085922\n",
      "\n",
      "_______________\n",
      "Evaluating feature importances\n",
      "\n",
      "_______________\n",
      "Feature Importance results:\n",
      "\n",
      "C13_nmbr\n",
      "metric =  0.013072780844650689\n",
      "metric2 =  0.0\n",
      "\n",
      "C1_nmbr\n",
      "metric =  0.036060216073424356\n",
      "metric2 =  0.0\n",
      "\n",
      "C14_nmbr\n",
      "metric =  0.005240965895620953\n",
      "metric2 =  0.0\n",
      "\n",
      "V317_nmbr\n",
      "metric =  0.002819453381650683\n",
      "metric2 =  0.0\n",
      "\n",
      "V318_nmbr\n",
      "metric =  0.0017780336641041172\n",
      "metric2 =  0.0\n",
      "\n",
      "TransactionAmt_nmbr\n",
      "metric =  0.00389474040708504\n",
      "metric2 =  0.0\n",
      "\n",
      "C11_nmbr\n",
      "metric =  0.006256985132251924\n",
      "metric2 =  0.0\n",
      "\n",
      "TransactionDT_nmbr\n",
      "metric =  0.004944626951603692\n",
      "metric2 =  0.0\n",
      "\n",
      "card6_1010_0\n",
      "metric =  0.0037931384834219983\n",
      "metric2 =  0.0037846716564500227\n",
      "\n",
      "card6_1010_1\n",
      "metric =  0.0037931384834219983\n",
      "metric2 =  0.001854235106851343\n",
      "\n",
      "card6_1010_2\n",
      "metric =  0.0037931384834219983\n",
      "metric2 =  0.0013546923155078883\n",
      "\n",
      "P_emaildomain_1010_0\n",
      "metric =  0.005452636569919123\n",
      "metric2 =  0.004834558200968675\n",
      "\n",
      "P_emaildomain_1010_1\n",
      "metric =  0.005452636569919123\n",
      "metric2 =  0.004589020218782824\n",
      "\n",
      "P_emaildomain_1010_2\n",
      "metric =  0.005452636569919123\n",
      "metric2 =  0.004995427913435213\n",
      "\n",
      "P_emaildomain_1010_3\n",
      "metric =  0.005452636569919123\n",
      "metric2 =  0.004902292816744036\n",
      "\n",
      "P_emaildomain_1010_4\n",
      "metric =  0.005452636569919123\n",
      "metric2 =  0.004436617333288262\n",
      "\n",
      "P_emaildomain_1010_5\n",
      "metric =  0.005452636569919123\n",
      "metric2 =  0.004072543773495529\n",
      "\n",
      "id_30_UPCS_spl9_ord3\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.001989704338402176\n",
      "\n",
      "id_30_UPCS_spl9_sp10_ord3\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.002506180783689582\n",
      "\n",
      "id_30_UPCS_nmc7_nmbr\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.0023791783791107246\n",
      "\n",
      "id_30_UPCS_1010_0\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.002700917804043801\n",
      "\n",
      "id_30_UPCS_1010_1\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.002836387035594523\n",
      "\n",
      "id_30_UPCS_1010_2\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.0028787211704541793\n",
      "\n",
      "id_30_UPCS_1010_3\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.0025485149185491274\n",
      "\n",
      "id_30_UPCS_1010_4\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.002531581264605287\n",
      "\n",
      "id_30_UPCS_1010_5\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.0024469129948860857\n",
      "\n",
      "id_30_UPCS_1010_6\n",
      "metric =  0.0029464557862295404\n",
      "metric2 =  0.002836387035594523\n",
      "\n",
      "id_31_UPCS_spl9_ord3\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.005621973109357525\n",
      "\n",
      "id_31_UPCS_spl9_sp10_ord3\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.0068157957123988755\n",
      "\n",
      "id_31_UPCS_nmc7_nmbr\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.007916483218748938\n",
      "\n",
      "id_31_UPCS_1010_0\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.007620144274731677\n",
      "\n",
      "id_31_UPCS_1010_1\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.008805500050801052\n",
      "\n",
      "id_31_UPCS_1010_2\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.007950350526636618\n",
      "\n",
      "id_31_UPCS_1010_3\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.008348291394317142\n",
      "\n",
      "id_31_UPCS_1010_4\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.008729298608053715\n",
      "\n",
      "id_31_UPCS_1010_5\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.008534561587699385\n",
      "\n",
      "id_31_UPCS_1010_6\n",
      "metric =  0.009144173129677968\n",
      "metric2 =  0.008644630338334403\n",
      "\n",
      "\n",
      "______________________\n",
      "sorted metric results:\n",
      "\n",
      "C1\n",
      "0.036060216073424356\n",
      "\n",
      "C13\n",
      "0.013072780844650689\n",
      "\n",
      "id_31\n",
      "0.009144173129677968\n",
      "\n",
      "C11\n",
      "0.006256985132251924\n",
      "\n",
      "P_emaildomain\n",
      "0.005452636569919123\n",
      "\n",
      "C14\n",
      "0.005240965895620953\n",
      "\n",
      "TransactionDT\n",
      "0.004944626951603692\n",
      "\n",
      "TransactionAmt\n",
      "0.00389474040708504\n",
      "\n",
      "card6\n",
      "0.0037931384834219983\n",
      "\n",
      "id_30\n",
      "0.0029464557862295404\n",
      "\n",
      "V317\n",
      "0.002819453381650683\n",
      "\n",
      "V318\n",
      "0.0017780336641041172\n",
      "\n",
      "______________________\n",
      "sorted metric2 results:\n",
      "\n",
      "for source column:  card6\n",
      "card6_1010_2\n",
      "0.0013546923155078883\n",
      "\n",
      "card6_1010_1\n",
      "0.001854235106851343\n",
      "\n",
      "card6_1010_0\n",
      "0.0037846716564500227\n",
      "\n",
      "\n",
      "for source column:  C13\n",
      "C13_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  C1\n",
      "C1_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  C14\n",
      "C14_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  V317\n",
      "V317_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  V318\n",
      "V318_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  P_emaildomain\n",
      "P_emaildomain_1010_5\n",
      "0.004072543773495529\n",
      "\n",
      "P_emaildomain_1010_4\n",
      "0.004436617333288262\n",
      "\n",
      "P_emaildomain_1010_1\n",
      "0.004589020218782824\n",
      "\n",
      "P_emaildomain_1010_0\n",
      "0.004834558200968675\n",
      "\n",
      "P_emaildomain_1010_3\n",
      "0.004902292816744036\n",
      "\n",
      "P_emaildomain_1010_2\n",
      "0.004995427913435213\n",
      "\n",
      "\n",
      "for source column:  TransactionAmt\n",
      "TransactionAmt_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  C11\n",
      "C11_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  TransactionDT\n",
      "TransactionDT_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  id_30\n",
      "id_30_UPCS_spl9_ord3\n",
      "0.001989704338402176\n",
      "\n",
      "id_30_UPCS_nmc7_nmbr\n",
      "0.0023791783791107246\n",
      "\n",
      "id_30_UPCS_1010_5\n",
      "0.0024469129948860857\n",
      "\n",
      "id_30_UPCS_spl9_sp10_ord3\n",
      "0.002506180783689582\n",
      "\n",
      "id_30_UPCS_1010_4\n",
      "0.002531581264605287\n",
      "\n",
      "id_30_UPCS_1010_3\n",
      "0.0025485149185491274\n",
      "\n",
      "id_30_UPCS_1010_0\n",
      "0.002700917804043801\n",
      "\n",
      "id_30_UPCS_1010_1\n",
      "0.002836387035594523\n",
      "\n",
      "id_30_UPCS_1010_6\n",
      "0.002836387035594523\n",
      "\n",
      "id_30_UPCS_1010_2\n",
      "0.0028787211704541793\n",
      "\n",
      "\n",
      "for source column:  id_31\n",
      "id_31_UPCS_spl9_ord3\n",
      "0.005621973109357525\n",
      "\n",
      "id_31_UPCS_spl9_sp10_ord3\n",
      "0.0068157957123988755\n",
      "\n",
      "id_31_UPCS_1010_0\n",
      "0.007620144274731677\n",
      "\n",
      "id_31_UPCS_nmc7_nmbr\n",
      "0.007916483218748938\n",
      "\n",
      "id_31_UPCS_1010_2\n",
      "0.007950350526636618\n",
      "\n",
      "id_31_UPCS_1010_3\n",
      "0.008348291394317142\n",
      "\n",
      "id_31_UPCS_1010_5\n",
      "0.008534561587699385\n",
      "\n",
      "id_31_UPCS_1010_6\n",
      "0.008644630338334403\n",
      "\n",
      "id_31_UPCS_1010_4\n",
      "0.008729298608053715\n",
      "\n",
      "id_31_UPCS_1010_1\n",
      "0.008805500050801052\n",
      "\n",
      "\n",
      "for source column:  isFraud\n",
      "\n",
      "_______________\n",
      "Feature Importance evaluation complete\n",
      "\n",
      "_______________\n",
      "Feature Importance results returned\n",
      "\n",
      "_______________\n",
      "Automunge Complete\n",
      "\n"
     ]
    }
   ],
   "source": [
    "#now let's try running again and applying 'or19'\n",
    "#to our two target features id_30, id_31\n",
    "\n",
    "#(or19 was described in detail in the paper)\n",
    "\n",
    "train, trainID, labels, \\\n",
    "validation1, validationID1, validationlabels1, \\\n",
    "validation2, validationID2, validationlabels2, \\\n",
    "test, testID, testlabels, \\\n",
    "labelsencoding_dict, finalcolumns_train, finalcolumns_test, \\\n",
    "featureimportance2, postprocess_dict2 = \\\n",
    "am.automunge(df_train, df_test = False, \\\n",
    "             labels_column = label_column, \\\n",
    "             randomseed = 42, eval_ratio = .0001, \\\n",
    "             pandasoutput = True, \\\n",
    "             featureselection = True, featuremethod = 'report', \\\n",
    "             ML_cmnd = {'autoML_type':'randomforest', \\\n",
    "                        'MLinfill_cmnd':{'RandomForestClassifier':{'n_estimators':222}}}, \\\n",
    "             assigncat = {'or19':['id_30', 'id_31']}, \\\n",
    "             processdict = {}, transformdict = {}, \\\n",
    "             printstatus = True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9808226369085922"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#the resulting accuracy is available in printouts and also here:\n",
    "postprocess_dict2['baseaccuracy']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0029464557862295404"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#we can view the feature importance metric results \n",
    "#associated with these features as\n",
    "#(where a higher score is more importance))\n",
    "postprocess_dict2['column_key']['id_30']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.009144173129677968"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "postprocess_dict2['column_key']['id_31']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id_30_UPCS_spl9_ord3': 0.001989704338402176,\n",
       " 'id_30_UPCS_nmc7_nmbr': 0.0023791783791107246,\n",
       " 'id_30_UPCS_1010_5': 0.0024469129948860857,\n",
       " 'id_30_UPCS_spl9_sp10_ord3': 0.002506180783689582,\n",
       " 'id_30_UPCS_1010_4': 0.002531581264605287,\n",
       " 'id_30_UPCS_1010_3': 0.0025485149185491274,\n",
       " 'id_30_UPCS_1010_0': 0.002700917804043801,\n",
       " 'id_30_UPCS_1010_1': 0.002836387035594523,\n",
       " 'id_30_UPCS_1010_6': 0.002836387035594523,\n",
       " 'id_30_UPCS_1010_2': 0.0028787211704541793}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#we can also view the relative importance between columns derived from \n",
    "#these features (where a lower score is more importance)\n",
    "postprocess_dict2['metric2_column_key']['id_30']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id_31_UPCS_spl9_ord3': 0.005621973109357525,\n",
       " 'id_31_UPCS_spl9_sp10_ord3': 0.0068157957123988755,\n",
       " 'id_31_UPCS_1010_0': 0.007620144274731677,\n",
       " 'id_31_UPCS_nmc7_nmbr': 0.007916483218748938,\n",
       " 'id_31_UPCS_1010_2': 0.007950350526636618,\n",
       " 'id_31_UPCS_1010_3': 0.008348291394317142,\n",
       " 'id_31_UPCS_1010_5': 0.008534561587699385,\n",
       " 'id_31_UPCS_1010_6': 0.008644630338334403,\n",
       " 'id_31_UPCS_1010_4': 0.008729298608053715,\n",
       " 'id_31_UPCS_1010_1': 0.008805500050801052}"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "postprocess_dict2['metric2_column_key']['id_31']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Scenario 3\n",
    "\n",
    "## 'sp19' string parsing applied to id_30, id_31"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_______________\n",
      "Begin Feature Importance evaluation\n",
      "\n",
      "_______________\n",
      "Begin Automunge processing\n",
      "\n",
      "evaluating column:  card6\n",
      "processing column:  card6\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['card6_1010_0', 'card6_1010_1', 'card6_1010_2']\n",
      "\n",
      "evaluating column:  C13\n",
      "processing column:  C13\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C13_nmbr']\n",
      "\n",
      "evaluating column:  C1\n",
      "processing column:  C1\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C1_nmbr']\n",
      "\n",
      "evaluating column:  C14\n",
      "processing column:  C14\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C14_nmbr']\n",
      "\n",
      "evaluating column:  V317\n",
      "processing column:  V317\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['V317_nmbr']\n",
      "\n",
      "evaluating column:  V318\n",
      "processing column:  V318\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['V318_nmbr']\n",
      "\n",
      "evaluating column:  P_emaildomain\n",
      "processing column:  P_emaildomain\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5']\n",
      "\n",
      "evaluating column:  TransactionAmt\n",
      "processing column:  TransactionAmt\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['TransactionAmt_nmbr']\n",
      "\n",
      "evaluating column:  C11\n",
      "processing column:  C11\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['C11_nmbr']\n",
      "\n",
      "evaluating column:  TransactionDT\n",
      "processing column:  TransactionDT\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['TransactionDT_nmbr']\n",
      "\n",
      "evaluating column:  id_30\n",
      "processing column:  id_30\n",
      "    root category:  stringparse\n",
      " returned columns:\n",
      "['id_30_ord3', 'id_30_nmrc', 'id_30_sp19_0', 'id_30_sp19_1', 'id_30_sp19_2', 'id_30_sp19_3', 'id_30_sp19_4']\n",
      "\n",
      "evaluating column:  id_31\n",
      "processing column:  id_31\n",
      "    root category:  stringparse\n",
      " returned columns:\n",
      "['id_31_ord3', 'id_31_nmrc', 'id_31_sp19_0', 'id_31_sp19_1', 'id_31_sp19_2', 'id_31_sp19_3', 'id_31_sp19_4', 'id_31_sp19_5', 'id_31_sp19_6']\n",
      "\n",
      "______\n",
      "\n",
      "evaluating label column:  isFraud\n",
      "processing label column:  isFraud\n",
      "    root label category:  lbbn\n",
      "\n",
      " returned columns:\n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  C13_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C1_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C14_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V317_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V318_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionAmt_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C11_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionDT_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_nmrc\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_nmrc\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_6\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "_______________\n",
      "Begin Validation set processing with Postmunge\n",
      "\n",
      "_______________\n",
      "Begin Postmunge processing\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  card6\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['card6_1010_0', 'card6_1010_1', 'card6_1010_2']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C13\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C13_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C1\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C1_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C14\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C14_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  V317\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['V317_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  V318\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['V318_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  P_emaildomain\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  TransactionAmt\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['TransactionAmt_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  C11\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['C11_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  TransactionDT\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['TransactionDT_nmbr']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  id_30\n",
      "    root category:  stringparse\n",
      "\n",
      " returned columns:\n",
      "['id_30_ord3', 'id_30_nmrc', 'id_30_sp19_0', 'id_30_sp19_1', 'id_30_sp19_2', 'id_30_sp19_3', 'id_30_sp19_4']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  id_31\n",
      "    root category:  stringparse\n",
      "\n",
      " returned columns:\n",
      "['id_31_ord3', 'id_31_nmrc', 'id_31_sp19_0', 'id_31_sp19_1', 'id_31_sp19_2', 'id_31_sp19_3', 'id_31_sp19_4', 'id_31_sp19_5', 'id_31_sp19_6']\n",
      "\n",
      "______\n",
      "\n",
      "processing label column:  isFraud\n",
      "    root label category:  lbbn\n",
      "\n",
      " returned columns:\n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  C13_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C1_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C14_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V317_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  V318_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionAmt_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  C11_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  TransactionDT_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_ord3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  card6_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  P_emaildomain_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_nmrc\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_30_sp19_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_nmrc\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  id_31_sp19_6\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "_______________\n",
      "Postmunge returned ID column set: \n",
      "['Automunge_index_612467042465']\n",
      "\n",
      "Postmunge returned test column set: \n",
      "['C13_nmbr', 'C1_nmbr', 'C14_nmbr', 'V317_nmbr', 'V318_nmbr', 'TransactionAmt_nmbr', 'C11_nmbr', 'TransactionDT_nmbr', 'id_30_ord3', 'id_31_ord3', 'card6_1010_0', 'card6_1010_1', 'card6_1010_2', 'P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5', 'id_30_nmrc', 'id_30_sp19_0', 'id_30_sp19_1', 'id_30_sp19_2', 'id_30_sp19_3', 'id_30_sp19_4', 'id_31_nmrc', 'id_31_sp19_0', 'id_31_sp19_1', 'id_31_sp19_2', 'id_31_sp19_3', 'id_31_sp19_4', 'id_31_sp19_5', 'id_31_sp19_6']\n",
      "\n",
      "Postmunge returned label column set: \n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "_______________\n",
      "Postmunge Complete\n",
      "\n",
      "______\n",
      "\n",
      "versioning serial stamp:\n",
      "_5.22_612467042465_2020-11-12T19:31:39.597488\n",
      "\n",
      "Automunge returned ID column set: \n",
      "['TransactionID', 'Automunge_index_612467042465']\n",
      "\n",
      "Automunge returned train column set: \n",
      "['C13_nmbr', 'C1_nmbr', 'C14_nmbr', 'V317_nmbr', 'V318_nmbr', 'TransactionAmt_nmbr', 'C11_nmbr', 'TransactionDT_nmbr', 'id_30_ord3', 'id_31_ord3', 'card6_1010_0', 'card6_1010_1', 'card6_1010_2', 'P_emaildomain_1010_0', 'P_emaildomain_1010_1', 'P_emaildomain_1010_2', 'P_emaildomain_1010_3', 'P_emaildomain_1010_4', 'P_emaildomain_1010_5', 'id_30_nmrc', 'id_30_sp19_0', 'id_30_sp19_1', 'id_30_sp19_2', 'id_30_sp19_3', 'id_30_sp19_4', 'id_31_nmrc', 'id_31_sp19_0', 'id_31_sp19_1', 'id_31_sp19_2', 'id_31_sp19_3', 'id_31_sp19_4', 'id_31_sp19_5', 'id_31_sp19_6']\n",
      "\n",
      "Automunge returned label column set: \n",
      "['isFraud_0.0', 'isFraud_1.0']\n",
      "\n",
      "_______________\n",
      "Automunge Complete\n",
      "\n",
      "_______________\n",
      "Training feature importance evaluation model\n",
      "\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Base Accuracy of feature importance model:\n",
      "0.9808141700816202\n",
      "\n",
      "_______________\n",
      "Evaluating feature importances\n",
      "\n",
      "_______________\n",
      "Feature Importance results:\n",
      "\n",
      "C13_nmbr\n",
      "metric =  0.010515799099129586\n",
      "metric2 =  0.0\n",
      "\n",
      "C1_nmbr\n",
      "metric =  0.035518339147221134\n",
      "metric2 =  0.0\n",
      "\n",
      "C14_nmbr\n",
      "metric =  0.005452636569919012\n",
      "metric2 =  0.0\n",
      "\n",
      "V317_nmbr\n",
      "metric =  0.0028448538625664987\n",
      "metric2 =  0.0\n",
      "\n",
      "V318_nmbr\n",
      "metric =  0.0018796355877671589\n",
      "metric2 =  0.0\n",
      "\n",
      "TransactionAmt_nmbr\n",
      "metric =  0.004038676465607738\n",
      "metric2 =  0.0\n",
      "\n",
      "C11_nmbr\n",
      "metric =  0.006324719748027174\n",
      "metric2 =  0.0\n",
      "\n",
      "TransactionDT_nmbr\n",
      "metric =  0.004927693297659741\n",
      "metric2 =  0.0\n",
      "\n",
      "id_30_ord3\n",
      "metric =  0.002794052900734867\n",
      "metric2 =  0.001524028854946291\n",
      "\n",
      "id_31_ord3\n",
      "metric =  0.009237308226369034\n",
      "metric2 =  0.005274833203508633\n",
      "\n",
      "card6_1010_0\n",
      "metric =  0.0037000033867308213\n",
      "metric2 =  0.0036915365597588456\n",
      "\n",
      "card6_1010_1\n",
      "metric =  0.0037000033867308213\n",
      "metric2 =  0.0018034341450198221\n",
      "\n",
      "card6_1010_2\n",
      "metric =  0.0037000033867308213\n",
      "metric2 =  0.001430893758255114\n",
      "\n",
      "P_emaildomain_1010_0\n",
      "metric =  0.006028380804009914\n",
      "metric2 =  0.005283300030480609\n",
      "\n",
      "P_emaildomain_1010_1\n",
      "metric =  0.006028380804009914\n",
      "metric2 =  0.004970027432519397\n",
      "\n",
      "P_emaildomain_1010_2\n",
      "metric =  0.006028380804009914\n",
      "metric2 =  0.005469570223862852\n",
      "\n",
      "P_emaildomain_1010_3\n",
      "metric =  0.006028380804009914\n",
      "metric2 =  0.005427236089003307\n",
      "\n",
      "P_emaildomain_1010_4\n",
      "metric =  0.006028380804009914\n",
      "metric2 =  0.004953093778575557\n",
      "\n",
      "P_emaildomain_1010_5\n",
      "metric =  0.006028380804009914\n",
      "metric2 =  0.004589020218782824\n",
      "\n",
      "id_30_nmrc\n",
      "metric =  0.002794052900734867\n",
      "metric2 =  0.002065905781149402\n",
      "\n",
      "id_30_sp19_0\n",
      "metric =  0.002794052900734867\n",
      "metric2 =  0.0026585836691841447\n",
      "\n",
      "id_30_sp19_1\n",
      "metric =  0.002794052900734867\n",
      "metric2 =  0.0025739153994648323\n",
      "\n",
      "id_30_sp19_2\n",
      "metric =  0.002794052900734867\n",
      "metric2 =  0.0022691096284755963\n",
      "\n",
      "id_30_sp19_3\n",
      "metric =  0.002794052900734867\n",
      "metric2 =  0.002345311071222933\n",
      "\n",
      "id_30_sp19_4\n",
      "metric =  0.002794052900734867\n",
      "metric2 =  0.0022606428015037316\n",
      "\n",
      "id_31_nmrc\n",
      "metric =  0.009237308226369034\n",
      "metric2 =  0.007450807735293052\n",
      "\n",
      "id_31_sp19_0\n",
      "metric =  0.009237308226369034\n",
      "metric2 =  0.00875469908896942\n",
      "\n",
      "id_31_sp19_1\n",
      "metric =  0.009237308226369034\n",
      "metric2 =  0.008830900531716757\n",
      "\n",
      "id_31_sp19_2\n",
      "metric =  0.009237308226369034\n",
      "metric2 =  0.008670030819250107\n",
      "\n",
      "id_31_sp19_3\n",
      "metric =  0.009237308226369034\n",
      "metric2 =  0.007239137060994993\n",
      "\n",
      "id_31_sp19_4\n",
      "metric =  0.009237308226369034\n",
      "metric2 =  0.008085819758187451\n",
      "\n",
      "id_31_sp19_5\n",
      "metric =  0.009237308226369034\n",
      "metric2 =  0.008797033223829076\n",
      "\n",
      "id_31_sp19_6\n",
      "metric =  0.009237308226369034\n",
      "metric2 =  0.00873776543502558\n",
      "\n",
      "\n",
      "______________________\n",
      "sorted metric results:\n",
      "\n",
      "C1\n",
      "0.035518339147221134\n",
      "\n",
      "C13\n",
      "0.010515799099129586\n",
      "\n",
      "id_31\n",
      "0.009237308226369034\n",
      "\n",
      "C11\n",
      "0.006324719748027174\n",
      "\n",
      "P_emaildomain\n",
      "0.006028380804009914\n",
      "\n",
      "C14\n",
      "0.005452636569919012\n",
      "\n",
      "TransactionDT\n",
      "0.004927693297659741\n",
      "\n",
      "TransactionAmt\n",
      "0.004038676465607738\n",
      "\n",
      "card6\n",
      "0.0037000033867308213\n",
      "\n",
      "V317\n",
      "0.0028448538625664987\n",
      "\n",
      "id_30\n",
      "0.002794052900734867\n",
      "\n",
      "V318\n",
      "0.0018796355877671589\n",
      "\n",
      "______________________\n",
      "sorted metric2 results:\n",
      "\n",
      "for source column:  card6\n",
      "card6_1010_2\n",
      "0.001430893758255114\n",
      "\n",
      "card6_1010_1\n",
      "0.0018034341450198221\n",
      "\n",
      "card6_1010_0\n",
      "0.0036915365597588456\n",
      "\n",
      "\n",
      "for source column:  C13\n",
      "C13_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  C1\n",
      "C1_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  C14\n",
      "C14_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  V317\n",
      "V317_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  V318\n",
      "V318_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  P_emaildomain\n",
      "P_emaildomain_1010_5\n",
      "0.004589020218782824\n",
      "\n",
      "P_emaildomain_1010_4\n",
      "0.004953093778575557\n",
      "\n",
      "P_emaildomain_1010_1\n",
      "0.004970027432519397\n",
      "\n",
      "P_emaildomain_1010_0\n",
      "0.005283300030480609\n",
      "\n",
      "P_emaildomain_1010_3\n",
      "0.005427236089003307\n",
      "\n",
      "P_emaildomain_1010_2\n",
      "0.005469570223862852\n",
      "\n",
      "\n",
      "for source column:  TransactionAmt\n",
      "TransactionAmt_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  C11\n",
      "C11_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  TransactionDT\n",
      "TransactionDT_nmbr\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  id_30\n",
      "id_30_ord3\n",
      "0.001524028854946291\n",
      "\n",
      "id_30_nmrc\n",
      "0.002065905781149402\n",
      "\n",
      "id_30_sp19_4\n",
      "0.0022606428015037316\n",
      "\n",
      "id_30_sp19_2\n",
      "0.0022691096284755963\n",
      "\n",
      "id_30_sp19_3\n",
      "0.002345311071222933\n",
      "\n",
      "id_30_sp19_1\n",
      "0.0025739153994648323\n",
      "\n",
      "id_30_sp19_0\n",
      "0.0026585836691841447\n",
      "\n",
      "\n",
      "for source column:  id_31\n",
      "id_31_ord3\n",
      "0.005274833203508633\n",
      "\n",
      "id_31_sp19_3\n",
      "0.007239137060994993\n",
      "\n",
      "id_31_nmrc\n",
      "0.007450807735293052\n",
      "\n",
      "id_31_sp19_4\n",
      "0.008085819758187451\n",
      "\n",
      "id_31_sp19_2\n",
      "0.008670030819250107\n",
      "\n",
      "id_31_sp19_6\n",
      "0.00873776543502558\n",
      "\n",
      "id_31_sp19_0\n",
      "0.00875469908896942\n",
      "\n",
      "id_31_sp19_5\n",
      "0.008797033223829076\n",
      "\n",
      "id_31_sp19_1\n",
      "0.008830900531716757\n",
      "\n",
      "\n",
      "for source column:  isFraud\n",
      "\n",
      "_______________\n",
      "Feature Importance evaluation complete\n",
      "\n",
      "_______________\n",
      "Feature Importance results returned\n",
      "\n",
      "_______________\n",
      "Automunge Complete\n",
      "\n"
     ]
    }
   ],
   "source": [
    "#now let's try running again and applying 'sp19'\n",
    "#to our two target features id_30, id_31\n",
    "\n",
    "#here we'll also supplement sp19 with nmrc and ord3\n",
    "\n",
    "#where sp19 is string parsing with cocurrent activations\n",
    "#nmrc extracts numeric portions of entries\n",
    "#and ord3 is an ordinal encoding sorted by frequency\n",
    "\n",
    "#for this scenario we'll create a custom set of transformations\n",
    "#by populating a transformdict and processdict\n",
    "\n",
    "transformdict = {}\n",
    "transformdict.update({'stringparse' : {'parents' : [], \\\n",
    "                                       'siblings': [], \\\n",
    "                                       'auntsuncles' : ['nmrc', 'sp19', 'ord3'], \\\n",
    "                                       'cousins' : [], \\\n",
    "                                       'children' : [], \\\n",
    "                                       'niecesnephews' : [], \\\n",
    "                                       'coworkers' : [], \\\n",
    "                                       'friends' : []}})\n",
    "\n",
    "processdict = {}\n",
    "processdict.update({'stringparse' : {'functionpointer' : 'sp19', \\\n",
    "                                     'defaultparams':{}, \\\n",
    "                                     'NArowtype' : 'justNaN', \\\n",
    "                                     'MLinfilltype' : 'exclude', \\\n",
    "                                     'labelctgy' : 'ord3'}})\n",
    "\n",
    "train, trainID, labels, \\\n",
    "validation1, validationID1, validationlabels1, \\\n",
    "validation2, validationID2, validationlabels2, \\\n",
    "test, testID, testlabels, \\\n",
    "labelsencoding_dict, finalcolumns_train, finalcolumns_test, \\\n",
    "featureimportance3, postprocess_dict3 = \\\n",
    "am.automunge(df_train, df_test = False, \\\n",
    "             labels_column = label_column, \\\n",
    "             randomseed = 42, eval_ratio = .0001, \\\n",
    "             pandasoutput = True, \\\n",
    "             featureselection = True, featuremethod = 'report', \\\n",
    "             ML_cmnd = {'autoML_type':'randomforest', \\\n",
    "                        'MLinfill_cmnd':{'RandomForestClassifier':{'n_estimators':222}}}, \\\n",
    "             assigncat = {'stringparse':['id_30', 'id_31']}, \\\n",
    "             processdict = processdict, transformdict = transformdict, \\\n",
    "             printstatus = True)\n",
    "                        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9808141700816202"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#the resulting accuracy is available in printouts and also here:\n",
    "postprocess_dict3['baseaccuracy']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.002794052900734867"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#we can view the feature importance metric results \n",
    "#associated with these features as\n",
    "#(where a higher score is more importance))\n",
    "postprocess_dict3['column_key']['id_30']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.009237308226369034"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "postprocess_dict3['column_key']['id_31']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id_30_ord3': 0.001524028854946291,\n",
       " 'id_30_nmrc': 0.002065905781149402,\n",
       " 'id_30_sp19_4': 0.0022606428015037316,\n",
       " 'id_30_sp19_2': 0.0022691096284755963,\n",
       " 'id_30_sp19_3': 0.002345311071222933,\n",
       " 'id_30_sp19_1': 0.0025739153994648323,\n",
       " 'id_30_sp19_0': 0.0026585836691841447}"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#we cann also view the relative importance between columns derived from \n",
    "#these features (where a lower score is more importance)\n",
    "postprocess_dict3['metric2_column_key']['id_30']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'id_31_ord3': 0.005274833203508633,\n",
       " 'id_31_sp19_3': 0.007239137060994993,\n",
       " 'id_31_nmrc': 0.007450807735293052,\n",
       " 'id_31_sp19_4': 0.008085819758187451,\n",
       " 'id_31_sp19_2': 0.008670030819250107,\n",
       " 'id_31_sp19_6': 0.00873776543502558,\n",
       " 'id_31_sp19_0': 0.00875469908896942,\n",
       " 'id_31_sp19_5': 0.008797033223829076,\n",
       " 'id_31_sp19_1': 0.008830900531716757}"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "postprocess_dict3['metric2_column_key']['id_31']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results of the evaluation are summarized here:\n",
    "\n",
    "|            |                     | base accuracy | id_30 metric | id_31 metric |\n",
    "|------------|---------------------|---------------|--------------|--------------|\n",
    "|            |                     |               |              |              |\n",
    "| scenario 1 | no string parsing   | 0.980399      | 0.001930     | 0.005808     |\n",
    "|            |                     |               |              |              |\n",
    "| scenario 2 | or19 string parsing | 0.980822      | 0.002946     | 0.009144     |\n",
    "|            |                     |               |              |              |\n",
    "| scenario 3 | sp19 string parsing | 0.980814      | 0.002794     | 0.009237     |\n",
    "|            |                     |               |              |              |\n",
    "\n",
    "\n",
    "Here the base accuracy represents the feature importance model\n",
    "trained on the entire data set, and the metrics are derived in automunge(.) by shuffle permutation, in other words by\n",
    "evaluating accuracy impact of shuffling the target feature set.\n",
    "\n",
    "Here we see that the base accuracy of the model was benefited\n",
    "by both types of string parsing, the or19 in scenario2 and the\n",
    "sp19 in scenario 3 (in comparison to scenario 1).\n",
    "\n",
    "The larger feature importance metrics in scenarios 2 & 3 for the two features\n",
    "also indicate that the string parsing operation had a positive influence.\n",
    "\n",
    "It appears the or19 version of string parsing from scenario 2 \n",
    "was more beneficial to the id_30 feature,\n",
    "and the sp19 version of string parsing from scenario 3\n",
    "was more beneficial to the id_31 feature."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
