{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Feature Importance Demonstration"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Automunge is available now for pip install:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install Automunge"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or to upgrade (we currently roll out upgrades pretty frequently):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !pip install Automunge --upgrade"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once installed, run this in a local session to initialize:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from Automunge import *\n",
    "am = AutoMunge()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll demonstrate feature importance on the Titanic set, a common benchmark. Importantly, in order to perform feature importance we'll need to include and designate a target label column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "#titanic set\n",
    "df_train = pd.read_csv('train.csv')\n",
    "df_test = pd.read_csv('test.csv')\n",
    "\n",
    "#titanic set\n",
    "labels_column = 'Survived'\n",
    "trainID_column = 'PassengerId'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can then perform feature importance in automunge(.) by the featureselection parameter. The results will be inlcuded about midway through the printouts and also available for inspections in the returned featureimportance dictionary. \n",
    "\n",
    "Note that automunge(.) performs feature importance by way of shuffle permutation, and relies on the same ML architecture used for ML infill, which in default configuration is Random Forest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "_______________\n",
      "Begin Feature Importance evaluation\n",
      "\n",
      "_______________\n",
      "Begin Automunge processing\n",
      "\n",
      "evaluating column:  Pclass\n",
      "processing column:  Pclass\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1']\n",
      "\n",
      "evaluating column:  Name\n",
      "processing column:  Name\n",
      "    root category:  hash\n",
      " returned columns:\n",
      "['Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8']\n",
      "\n",
      "evaluating column:  Sex\n",
      "processing column:  Sex\n",
      "    root category:  bnry\n",
      " returned columns:\n",
      "['Sex_bnry', 'Sex_NArw']\n",
      "\n",
      "evaluating column:  Age\n",
      "processing column:  Age\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['Age_nmbr', 'Age_NArw']\n",
      "\n",
      "evaluating column:  SibSp\n",
      "processing column:  SibSp\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['SibSp_nmbr', 'SibSp_NArw']\n",
      "\n",
      "evaluating column:  Parch\n",
      "processing column:  Parch\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['Parch_nmbr', 'Parch_NArw']\n",
      "\n",
      "evaluating column:  Ticket\n",
      "processing column:  Ticket\n",
      "    root category:  hash\n",
      " returned columns:\n",
      "['Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2']\n",
      "\n",
      "evaluating column:  Fare\n",
      "processing column:  Fare\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['Fare_nmbr', 'Fare_NArw']\n",
      "\n",
      "evaluating column:  Cabin\n",
      "processing column:  Cabin\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6']\n",
      "\n",
      "evaluating column:  Embarked\n",
      "processing column:  Embarked\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "______\n",
      "\n",
      "evaluating label column:  Survived\n",
      "processing label column:  Survived\n",
      "    root label category:  lbor\n",
      "\n",
      " returned columns:\n",
      "['Survived_ordl']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  Sex_bnry\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Age_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  SibSp_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Parch_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Fare_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Pclass_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Pclass_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_6\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Embarked_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Embarked_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "_______________\n",
      "Begin Validation set processing with Postmunge\n",
      "\n",
      "_______________\n",
      "Begin Postmunge processing\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Pclass\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Name\n",
      "    root category:  hash\n",
      "\n",
      " returned columns:\n",
      "['Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Sex\n",
      "    root category:  bnry\n",
      "\n",
      " returned columns:\n",
      "['Sex_bnry', 'Sex_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Age\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['Age_nmbr', 'Age_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  SibSp\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['SibSp_nmbr', 'SibSp_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Parch\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['Parch_nmbr', 'Parch_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Ticket\n",
      "    root category:  hash\n",
      "\n",
      " returned columns:\n",
      "['Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Fare\n",
      "    root category:  nmbr\n",
      "\n",
      " returned columns:\n",
      "['Fare_nmbr', 'Fare_NArw']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Cabin\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6']\n",
      "\n",
      "______\n",
      "\n",
      "processing column:  Embarked\n",
      "    root category:  1010\n",
      "\n",
      " returned columns:\n",
      "['Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "______\n",
      "\n",
      "processing label column:  Survived\n",
      "    root label category:  lbor\n",
      "\n",
      " returned columns:\n",
      "['Survived_ordl']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  Sex_bnry\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Age_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  SibSp_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Parch_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Fare_nmbr\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Pclass_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Pclass_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_2\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_3\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_4\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_5\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Cabin_1010_6\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Embarked_1010_0\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "infill to column:  Embarked_1010_1\n",
      "     infill type: stdrdinfill\n",
      "\n",
      "_______________\n",
      "Postmunge returned ID column set: \n",
      "['Automunge_index']\n",
      "\n",
      "Postmunge returned test column set: \n",
      "['Sex_bnry', 'Age_nmbr', 'SibSp_nmbr', 'Parch_nmbr', 'Fare_nmbr', 'Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1', 'Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Sex_NArw', 'Age_NArw', 'SibSp_NArw', 'Parch_NArw', 'Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2', 'Fare_NArw', 'Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6', 'Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "Postmunge returned label column set: \n",
      "['Survived_ordl']\n",
      "\n",
      "_______________\n",
      "Postmunge Complete\n",
      "\n",
      "______\n",
      "\n",
      "versioning serial stamp:\n",
      "_6.02_386161199297_2021-04-23T21:34:40.997199\n",
      "\n",
      "Automunge returned ID column set: \n",
      "['PassengerId', 'Automunge_index']\n",
      "\n",
      "Automunge returned train column set: \n",
      "['Sex_bnry', 'Age_nmbr', 'SibSp_nmbr', 'Parch_nmbr', 'Fare_nmbr', 'Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1', 'Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Sex_NArw', 'Age_NArw', 'SibSp_NArw', 'Parch_NArw', 'Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2', 'Fare_NArw', 'Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6', 'Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "Automunge returned label column set: \n",
      "['Survived_ordl']\n",
      "\n",
      "_______________\n",
      "Automunge Complete\n",
      "\n",
      "_______________\n",
      "Training feature importance evaluation model\n",
      "\n",
      "Base Accuracy of feature importance model:\n",
      "0.7653631284916201\n",
      "\n",
      "_______________\n",
      "Evaluating feature importances\n",
      "\n",
      "_______________\n",
      "Feature Importance results:\n",
      "\n",
      "Sex_bnry\n",
      "metric =  0.06145251396648044\n",
      "metric2 =  0.0\n",
      "\n",
      "Age_nmbr\n",
      "metric =  -0.016759776536312776\n",
      "metric2 =  -0.005586592178770888\n",
      "\n",
      "SibSp_nmbr\n",
      "metric =  0.0\n",
      "metric2 =  0.0\n",
      "\n",
      "Parch_nmbr\n",
      "metric =  0.0\n",
      "metric2 =  0.0\n",
      "\n",
      "Fare_nmbr\n",
      "metric =  0.0\n",
      "metric2 =  0.0\n",
      "\n",
      "Pclass_NArw\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  0.005586592178770999\n",
      "\n",
      "Pclass_1010_0\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  -0.016759776536312776\n",
      "\n",
      "Pclass_1010_1\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  0.0\n",
      "\n",
      "Name_NArw\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  0.022346368715083775\n",
      "\n",
      "Name_hash_0\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  0.011173184357541888\n",
      "\n",
      "Name_hash_1\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  0.033519553072625774\n",
      "\n",
      "Name_hash_2\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  0.022346368715083775\n",
      "\n",
      "Name_hash_3\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  0.027932960893854775\n",
      "\n",
      "Name_hash_4\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  0.005586592178770999\n",
      "\n",
      "Name_hash_5\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  -0.005586592178770888\n",
      "\n",
      "Name_hash_6\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  0.016759776536312887\n",
      "\n",
      "Name_hash_7\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  0.016759776536312887\n",
      "\n",
      "Name_hash_8\n",
      "metric =  0.022346368715083775\n",
      "metric2 =  0.022346368715083775\n",
      "\n",
      "Sex_NArw\n",
      "metric =  0.06145251396648044\n",
      "metric2 =  0.06145251396648044\n",
      "\n",
      "Age_NArw\n",
      "metric =  -0.016759776536312776\n",
      "metric2 =  -0.005586592178770888\n",
      "\n",
      "SibSp_NArw\n",
      "metric =  0.0\n",
      "metric2 =  0.0\n",
      "\n",
      "Parch_NArw\n",
      "metric =  0.0\n",
      "metric2 =  0.0\n",
      "\n",
      "Ticket_NArw\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  0.005586592178770999\n",
      "\n",
      "Ticket_hash_0\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  0.0\n",
      "\n",
      "Ticket_hash_1\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  0.0\n",
      "\n",
      "Ticket_hash_2\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  0.005586592178770999\n",
      "\n",
      "Fare_NArw\n",
      "metric =  0.0\n",
      "metric2 =  0.0\n",
      "\n",
      "Cabin_NArw\n",
      "metric =  0.016759776536312887\n",
      "metric2 =  0.005586592178770999\n",
      "\n",
      "Cabin_1010_0\n",
      "metric =  0.016759776536312887\n",
      "metric2 =  0.011173184357541888\n",
      "\n",
      "Cabin_1010_1\n",
      "metric =  0.016759776536312887\n",
      "metric2 =  0.016759776536312887\n",
      "\n",
      "Cabin_1010_2\n",
      "metric =  0.016759776536312887\n",
      "metric2 =  0.016759776536312887\n",
      "\n",
      "Cabin_1010_3\n",
      "metric =  0.016759776536312887\n",
      "metric2 =  0.011173184357541888\n",
      "\n",
      "Cabin_1010_4\n",
      "metric =  0.016759776536312887\n",
      "metric2 =  0.016759776536312887\n",
      "\n",
      "Cabin_1010_5\n",
      "metric =  0.016759776536312887\n",
      "metric2 =  0.016759776536312887\n",
      "\n",
      "Cabin_1010_6\n",
      "metric =  0.016759776536312887\n",
      "metric2 =  0.016759776536312887\n",
      "\n",
      "Embarked_NArw\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  0.005586592178770999\n",
      "\n",
      "Embarked_1010_0\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  0.0\n",
      "\n",
      "Embarked_1010_1\n",
      "metric =  0.005586592178770999\n",
      "metric2 =  0.0\n",
      "\n",
      "\n",
      "______________________\n",
      "sorted metric results:\n",
      "\n",
      "Sex\n",
      "0.06145251396648044\n",
      "\n",
      "Name\n",
      "0.022346368715083775\n",
      "\n",
      "Cabin\n",
      "0.016759776536312887\n",
      "\n",
      "Pclass\n",
      "0.005586592178770999\n",
      "\n",
      "Ticket\n",
      "0.005586592178770999\n",
      "\n",
      "Embarked\n",
      "0.005586592178770999\n",
      "\n",
      "SibSp\n",
      "0.0\n",
      "\n",
      "Parch\n",
      "0.0\n",
      "\n",
      "Fare\n",
      "0.0\n",
      "\n",
      "Age\n",
      "-0.016759776536312776\n",
      "\n",
      "______________________\n",
      "sorted metric2 results:\n",
      "\n",
      "for source column:  Pclass\n",
      "Pclass_1010_0\n",
      "-0.016759776536312776\n",
      "\n",
      "Pclass_1010_1\n",
      "0.0\n",
      "\n",
      "Pclass_NArw\n",
      "0.005586592178770999\n",
      "\n",
      "\n",
      "for source column:  Name\n",
      "Name_hash_5\n",
      "-0.005586592178770888\n",
      "\n",
      "Name_hash_4\n",
      "0.005586592178770999\n",
      "\n",
      "Name_hash_0\n",
      "0.011173184357541888\n",
      "\n",
      "Name_hash_6\n",
      "0.016759776536312887\n",
      "\n",
      "Name_hash_7\n",
      "0.016759776536312887\n",
      "\n",
      "Name_NArw\n",
      "0.022346368715083775\n",
      "\n",
      "Name_hash_2\n",
      "0.022346368715083775\n",
      "\n",
      "Name_hash_8\n",
      "0.022346368715083775\n",
      "\n",
      "Name_hash_3\n",
      "0.027932960893854775\n",
      "\n",
      "Name_hash_1\n",
      "0.033519553072625774\n",
      "\n",
      "\n",
      "for source column:  Sex\n",
      "Sex_bnry\n",
      "0.0\n",
      "\n",
      "Sex_NArw\n",
      "0.06145251396648044\n",
      "\n",
      "\n",
      "for source column:  Age\n",
      "Age_nmbr\n",
      "-0.005586592178770888\n",
      "\n",
      "Age_NArw\n",
      "-0.005586592178770888\n",
      "\n",
      "\n",
      "for source column:  SibSp\n",
      "SibSp_nmbr\n",
      "0.0\n",
      "\n",
      "SibSp_NArw\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  Parch\n",
      "Parch_nmbr\n",
      "0.0\n",
      "\n",
      "Parch_NArw\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  Ticket\n",
      "Ticket_hash_0\n",
      "0.0\n",
      "\n",
      "Ticket_hash_1\n",
      "0.0\n",
      "\n",
      "Ticket_NArw\n",
      "0.005586592178770999\n",
      "\n",
      "Ticket_hash_2\n",
      "0.005586592178770999\n",
      "\n",
      "\n",
      "for source column:  Fare\n",
      "Fare_nmbr\n",
      "0.0\n",
      "\n",
      "Fare_NArw\n",
      "0.0\n",
      "\n",
      "\n",
      "for source column:  Cabin\n",
      "Cabin_NArw\n",
      "0.005586592178770999\n",
      "\n",
      "Cabin_1010_0\n",
      "0.011173184357541888\n",
      "\n",
      "Cabin_1010_3\n",
      "0.011173184357541888\n",
      "\n",
      "Cabin_1010_1\n",
      "0.016759776536312887\n",
      "\n",
      "Cabin_1010_2\n",
      "0.016759776536312887\n",
      "\n",
      "Cabin_1010_4\n",
      "0.016759776536312887\n",
      "\n",
      "Cabin_1010_5\n",
      "0.016759776536312887\n",
      "\n",
      "Cabin_1010_6\n",
      "0.016759776536312887\n",
      "\n",
      "\n",
      "for source column:  Embarked\n",
      "Embarked_1010_0\n",
      "0.0\n",
      "\n",
      "Embarked_1010_1\n",
      "0.0\n",
      "\n",
      "Embarked_NArw\n",
      "0.005586592178770999\n",
      "\n",
      "\n",
      "for source column:  Survived\n",
      "\n",
      "_______________\n",
      "Feature Importance evaluation complete\n",
      "\n",
      "_______________\n",
      "Begin Automunge processing\n",
      "\n",
      "evaluating column:  Pclass\n",
      "processing column:  Pclass\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1']\n",
      "\n",
      "evaluating column:  Name\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "processing column:  Name\n",
      "    root category:  hash\n",
      " returned columns:\n",
      "['Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Name_hash_9', 'Name_hash_10', 'Name_hash_11', 'Name_hash_12', 'Name_hash_13']\n",
      "\n",
      "evaluating column:  Sex\n",
      "processing column:  Sex\n",
      "    root category:  bnry\n",
      " returned columns:\n",
      "['Sex_bnry', 'Sex_NArw']\n",
      "\n",
      "evaluating column:  Age\n",
      "processing column:  Age\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['Age_nmbr', 'Age_NArw']\n",
      "\n",
      "evaluating column:  SibSp\n",
      "processing column:  SibSp\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['SibSp_nmbr', 'SibSp_NArw']\n",
      "\n",
      "evaluating column:  Parch\n",
      "processing column:  Parch\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['Parch_nmbr', 'Parch_NArw']\n",
      "\n",
      "evaluating column:  Ticket\n",
      "processing column:  Ticket\n",
      "    root category:  hash\n",
      " returned columns:\n",
      "['Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2']\n",
      "\n",
      "evaluating column:  Fare\n",
      "processing column:  Fare\n",
      "    root category:  nmbr\n",
      " returned columns:\n",
      "['Fare_nmbr', 'Fare_NArw']\n",
      "\n",
      "evaluating column:  Cabin\n",
      "processing column:  Cabin\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6', 'Cabin_1010_7']\n",
      "\n",
      "evaluating column:  Embarked\n",
      "processing column:  Embarked\n",
      "    root category:  1010\n",
      " returned columns:\n",
      "['Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "______\n",
      "\n",
      "evaluating label column:  Survived\n",
      "processing label column:  Survived\n",
      "    root label category:  lbor\n",
      "\n",
      " returned columns:\n",
      "['Survived_ordl']\n",
      "\n",
      "______\n",
      "\n",
      "infill to column:  Sex_bnry\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Age_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  SibSp_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Parch_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Fare_nmbr\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Pclass_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Pclass_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_2\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_3\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_4\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_5\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_6\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Cabin_1010_7\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Embarked_1010_0\n",
      "     infill type: MLinfill\n",
      "\n",
      "infill to column:  Embarked_1010_1\n",
      "     infill type: MLinfill\n",
      "\n",
      "______\n",
      "\n",
      "versioning serial stamp:\n",
      "_6.02_699612848010_2021-04-23T21:34:40.987073\n",
      "\n",
      "Automunge returned ID column set: \n",
      "['PassengerId', 'Automunge_index']\n",
      "\n",
      "Automunge returned train column set: \n",
      "['Sex_bnry', 'Age_nmbr', 'SibSp_nmbr', 'Parch_nmbr', 'Fare_nmbr', 'Pclass_NArw', 'Pclass_1010_0', 'Pclass_1010_1', 'Name_NArw', 'Name_hash_0', 'Name_hash_1', 'Name_hash_2', 'Name_hash_3', 'Name_hash_4', 'Name_hash_5', 'Name_hash_6', 'Name_hash_7', 'Name_hash_8', 'Name_hash_9', 'Name_hash_10', 'Name_hash_11', 'Name_hash_12', 'Name_hash_13', 'Sex_NArw', 'Age_NArw', 'SibSp_NArw', 'Parch_NArw', 'Ticket_NArw', 'Ticket_hash_0', 'Ticket_hash_1', 'Ticket_hash_2', 'Fare_NArw', 'Cabin_NArw', 'Cabin_1010_0', 'Cabin_1010_1', 'Cabin_1010_2', 'Cabin_1010_3', 'Cabin_1010_4', 'Cabin_1010_5', 'Cabin_1010_6', 'Cabin_1010_7', 'Embarked_NArw', 'Embarked_1010_0', 'Embarked_1010_1']\n",
      "\n",
      "Automunge returned label column set: \n",
      "['Survived_ordl']\n",
      "\n",
      "_______________\n",
      "Automunge Complete\n",
      "\n"
     ]
    }
   ],
   "source": [
    "train, train_ID, labels, \\\n",
    "val, val_ID, val_labels, \\\n",
    "test, test_ID, test_labels, \\\n",
    "postprocess_dict \\\n",
    "= am.automunge(df_train,\n",
    "               labels_column = labels_column,\n",
    "               trainID_column = trainID_column,\n",
    "               featureselection = True,\n",
    "               printstatus = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results included in printouts are first presented in order of features and then again sorted by the metric from most important to least.\n",
    "\n",
    "Here \"metric\" measures the importance of the source column and a higher value implies higher importance.\n",
    "\n",
    "And \"metric2\" measures the relative importance of features derived from the same source column and a lower value implies higher relative importance.\n",
    "\n",
    "Important to keep in mind that feature importance is as much a measure of the model as it is of the feature."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also inspect results in the returned postprocess_dict dictionary.\n",
    "\n",
    "Here postprocess_dict['FS_sorted'] are the results sorted by the metric, and postprocess_dict['FScolumn_dict'] are the raw unsorted data. Each aggragation includes the data in a few different formats."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0.06145251396648044: ['Sex'],\n",
       " 0.022346368715083775: ['Name'],\n",
       " 0.016759776536312887: ['Cabin'],\n",
       " 0.005586592178770999: ['Pclass', 'Ticket', 'Embarked'],\n",
       " 0.0: ['SibSp', 'Parch', 'Fare'],\n",
       " -0.016759776536312776: ['Age']}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Here are the sorted metric results \n",
    "\n",
    "postprocess_dict['FS_sorted']['metric_key']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Sex': 0.06145251396648044,\n",
       " 'Name': 0.022346368715083775,\n",
       " 'Cabin': 0.016759776536312887,\n",
       " 'Pclass': 0.005586592178770999,\n",
       " 'Ticket': 0.005586592178770999,\n",
       " 'Embarked': 0.005586592178770999,\n",
       " 'SibSp': 0.0,\n",
       " 'Parch': 0.0,\n",
       " 'Fare': 0.0,\n",
       " 'Age': -0.016759776536312776}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#or as an inversion of that presentation\n",
    "\n",
    "postprocess_dict['FS_sorted']['column_key']"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
