{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Geo-Specific Expense Analysis (Flag 92)\n",
    "\n",
    "### Dataset Description\n",
    "The dataset consists of 500 entries simulating the ServiceNow fm_expense_line table, which records various attributes of financial expenses. Key fields include 'number', 'opened_at', 'amount', 'state', 'short_description', 'ci', 'user', 'department', 'category', 'location', 'processed_date', 'source_id', and 'type'. This table documents the flow of financial transactions by detailing the amount, departmental allocation, geographic location, and the nature of each expense. It provides a comprehensive view of organizational expenditures across different categories and locations, highlighting both the timing and the approval state of each financial entry.\n",
    "\n",
    "### Your Task\n",
    "**Goal**: To analyze and understand how expenses vary across different geographic locations, expense categories, and approval times, with the aim of improving budget allocation and workflow efficiency.\n",
    "\n",
    "**Role**: Financial Operations Analyst\n",
    "\n",
    "**Difficulty**: 3 out of 5\n",
    "\n",
    "**Category**: Finance Management"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Necessary Libraries\n",
    "This cell imports all necessary libraries required for the analysis. This includes libraries for data manipulation, data visualization, and any specific utilities needed for the tasks. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import argparse\n",
    "import pandas as pd\n",
    "import json\n",
    "import requests\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "from pandas import date_range"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load Dataset\n",
    "This cell loads the dataset to be analyzed. The data is stored in the from a CSV file, and is imported into a DataFrame. It involves specifying the path to the dataset, using pandas to read the file, and confirming its successful load by inspecting the first few table entries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>state</th>\n",
       "      <th>closed_at</th>\n",
       "      <th>opened_at</th>\n",
       "      <th>closed_by</th>\n",
       "      <th>number</th>\n",
       "      <th>sys_updated_by</th>\n",
       "      <th>location</th>\n",
       "      <th>assigned_to</th>\n",
       "      <th>caller_id</th>\n",
       "      <th>sys_updated_on</th>\n",
       "      <th>short_description</th>\n",
       "      <th>priority</th>\n",
       "      <th>assignement_group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Database</td>\n",
       "      <td>Closed</td>\n",
       "      <td>2023-07-25 03:32:18.462401146</td>\n",
       "      <td>2023-01-02 11:04:00</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>INC0000000034</td>\n",
       "      <td>admin</td>\n",
       "      <td>Australia</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>ITIL User</td>\n",
       "      <td>2023-07-06 03:31:13.838619495</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Database</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Hardware</td>\n",
       "      <td>Closed</td>\n",
       "      <td>2023-03-11 13:42:59.511508874</td>\n",
       "      <td>2023-01-03 10:19:00</td>\n",
       "      <td>Charlie Whitherspoon</td>\n",
       "      <td>INC0000000025</td>\n",
       "      <td>admin</td>\n",
       "      <td>India</td>\n",
       "      <td>Beth Anglin</td>\n",
       "      <td>Don Goodliffe</td>\n",
       "      <td>2023-05-19 04:22:50.443252112</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>1 - Critical</td>\n",
       "      <td>Hardware</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Database</td>\n",
       "      <td>Resolved</td>\n",
       "      <td>2023-01-20 14:37:18.361510788</td>\n",
       "      <td>2023-01-04 06:37:00</td>\n",
       "      <td>Charlie Whitherspoon</td>\n",
       "      <td>INC0000000354</td>\n",
       "      <td>system</td>\n",
       "      <td>India</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>ITIL User</td>\n",
       "      <td>2023-02-13 08:10:20.378839709</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Database</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Hardware</td>\n",
       "      <td>Resolved</td>\n",
       "      <td>2023-01-25 20:46:13.679914432</td>\n",
       "      <td>2023-01-04 06:53:00</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>INC0000000023</td>\n",
       "      <td>admin</td>\n",
       "      <td>Canada</td>\n",
       "      <td>Luke Wilson</td>\n",
       "      <td>Don Goodliffe</td>\n",
       "      <td>2023-06-14 11:45:24.784548040</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Hardware</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hardware</td>\n",
       "      <td>Closed</td>\n",
       "      <td>2023-05-10 22:35:58.881919516</td>\n",
       "      <td>2023-01-05 16:52:00</td>\n",
       "      <td>Luke Wilson</td>\n",
       "      <td>INC0000000459</td>\n",
       "      <td>employee</td>\n",
       "      <td>UK</td>\n",
       "      <td>Charlie Whitherspoon</td>\n",
       "      <td>David Loo</td>\n",
       "      <td>2023-06-11 20:25:35.094482408</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Hardware</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   category     state                      closed_at            opened_at  \\\n",
       "0  Database    Closed  2023-07-25 03:32:18.462401146  2023-01-02 11:04:00   \n",
       "1  Hardware    Closed  2023-03-11 13:42:59.511508874  2023-01-03 10:19:00   \n",
       "2  Database  Resolved  2023-01-20 14:37:18.361510788  2023-01-04 06:37:00   \n",
       "3  Hardware  Resolved  2023-01-25 20:46:13.679914432  2023-01-04 06:53:00   \n",
       "4  Hardware    Closed  2023-05-10 22:35:58.881919516  2023-01-05 16:52:00   \n",
       "\n",
       "              closed_by         number sys_updated_by   location  \\\n",
       "0            Fred Luddy  INC0000000034          admin  Australia   \n",
       "1  Charlie Whitherspoon  INC0000000025          admin      India   \n",
       "2  Charlie Whitherspoon  INC0000000354         system      India   \n",
       "3            Fred Luddy  INC0000000023          admin     Canada   \n",
       "4           Luke Wilson  INC0000000459       employee         UK   \n",
       "\n",
       "            assigned_to      caller_id                 sys_updated_on  \\\n",
       "0            Fred Luddy      ITIL User  2023-07-06 03:31:13.838619495   \n",
       "1           Beth Anglin  Don Goodliffe  2023-05-19 04:22:50.443252112   \n",
       "2            Fred Luddy      ITIL User  2023-02-13 08:10:20.378839709   \n",
       "3           Luke Wilson  Don Goodliffe  2023-06-14 11:45:24.784548040   \n",
       "4  Charlie Whitherspoon      David Loo  2023-06-11 20:25:35.094482408   \n",
       "\n",
       "    short_description      priority assignement_group  \n",
       "0  There was an issue      2 - High          Database  \n",
       "1  There was an issue  1 - Critical          Hardware  \n",
       "2  There was an issue      2 - High          Database  \n",
       "3  There was an issue      2 - High          Hardware  \n",
       "4  There was an issue      2 - High          Hardware  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset_path = \"csvs/flag-92.csv\"\n",
    "data = pd.read_csv(dataset_path)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 1: How do expenses vary across different geographic locations?**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Analyzing the expense amounts across different geographic locations reveals notable differences. Certain regions like North America and Europe have higher average expenses, while regions like Asia and Africa show lower average expenses. Understanding these differences can help in regional budgeting and financial planning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# # Calculate average amount for each location\n",
    "# avg_amount_by_location = data.groupby('location')['amount'].mean().reset_index()\n",
    "\n",
    "# # Set the style of the visualization\n",
    "# sns.set(style=\"whitegrid\")\n",
    "\n",
    "# # Create a bar plot for average amount by location\n",
    "# plt.figure(figsize=(12, 6))\n",
    "# sns.barplot(x='location', y='amount', data=avg_amount_by_location, palette='viridis')\n",
    "# plt.title('Average Expense Amount by Location')\n",
    "# plt.xlabel('Location')\n",
    "# plt.ylabel('Average Amount')\n",
    "# plt.xticks(rotation=45)\n",
    "# plt.show()\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'comparative',\n",
       " 'insight': \"Analysis could not be completed due to missing 'amount' column in the dataset\",\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': \"A bar plot was attempted but failed due to missing 'amount' column in the dataset\"},\n",
       " 'question': 'How do expenses vary across different geographic locations?',\n",
       " 'actionable_insight': \"No actionable insight could be generated due to missing 'amount' column in the dataset\"}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "    \"data_type\": \"comparative\",\n",
    "    \"insight\": \"Analysis could not be completed due to missing 'amount' column in the dataset\",\n",
    "    \"insight_value\": {},\n",
    "    \"plot\": {\n",
    "        \"description\": \"A bar plot was attempted but failed due to missing 'amount' column in the dataset\"\n",
    "    },\n",
    "    \"question\": \"How do expenses vary across different geographic locations?\",\n",
    "    \"actionable_insight\": \"No actionable insight could be generated due to missing 'amount' column in the dataset\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 2: How are expenses distributed across different categories?**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Analyzing the distribution of expense categories provides insights into which types of expenses are most common. This information can help understand spending patterns and identify areas for cost-saving opportunities or increased financial oversight."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# import matplotlib.pyplot as plt\n",
    "\n",
    "# # Group by category and sum the amount\n",
    "# total_expenses_by_category = data.groupby('category')['amount'].sum().sort_values(ascending=False)\n",
    "\n",
    "# # Plotting\n",
    "# plt.figure(figsize=(10, 6))\n",
    "# total_expenses_by_category.plot(kind='bar', color='skyblue')\n",
    "# plt.title('Total Expenses by Category')\n",
    "# plt.xlabel('Category')\n",
    "# plt.ylabel('Total Expenses ($)')\n",
    "# plt.xticks(rotation=45, ha='right')\n",
    "# plt.tight_layout()\n",
    "# plt.show()\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'distribution',\n",
       " 'insight': \"Analysis could not be completed due to missing 'amount' column in the dataset\",\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': \"A bar chart was attempted to show expense distribution by category but failed due to missing 'amount' column in the dataset\"},\n",
       " 'question': 'How are expenses distributed across different categories?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "    \"data_type\": \"distribution\",\n",
    "    \"insight\": \"Analysis could not be completed due to missing 'amount' column in the dataset\",\n",
    "    \"insight_value\": {},\n",
    "    \"plot\": {\n",
    "        \"description\": \"A bar chart was attempted to show expense distribution by category but failed due to missing 'amount' column in the dataset\"\n",
    "    },\n",
    "    \"question\": \"How are expenses distributed across different categories?\",\n",
    "    \"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 3:  What are the total expenses by department?**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Identifying the total expenses by department will help to determine which departments are the most resource-intensive and may require closer financial monitoring."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# # Group by department and sum the amount\n",
    "# total_expenses_by_department = data.groupby('department')['amount'].sum().sort_values(ascending=False)\n",
    "\n",
    "# # Plotting\n",
    "# plt.figure(figsize=(10, 6))\n",
    "# total_expenses_by_department.plot(kind='bar', color='lightcoral')\n",
    "# plt.title('Total Expenses by Department')\n",
    "# plt.xlabel('Department')\n",
    "# plt.ylabel('Total Expenses ($)')\n",
    "# plt.xticks(rotation=45, ha='right')\n",
    "# plt.tight_layout()\n",
    "# plt.show()\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'comparative',\n",
       " 'insight': \"Analysis could not be completed due to missing 'department' column in the dataset\",\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': \"A bar chart was attempted but failed due to missing 'department' column in the dataset\"},\n",
       " 'question': 'What are the total expenses by department?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "    \"data_type\": \"comparative\",\n",
    "    \"insight\": \"Analysis could not be completed due to missing 'department' column in the dataset\",\n",
    "    \"insight_value\": {},\n",
    "    \"plot\": {\n",
    "        \"description\": \"A bar chart was attempted but failed due to missing 'department' column in the dataset\"\n",
    "    },\n",
    "    \"question\": \"What are the total expenses by department?\",\n",
    "    \"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 4:** What is the average expense by department?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This analysis will show which departments have higher average expense claims, indicating potentially larger or more frequent expense requests within those departments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# # Group by department and calculate the average amount\n",
    "# average_expense_by_department = data.groupby('department')['amount'].mean().sort_values(ascending=False)\n",
    "\n",
    "# # Plotting\n",
    "# plt.figure(figsize=(10, 6))\n",
    "# average_expense_by_department.plot(kind='bar', color='goldenrod')\n",
    "# plt.title('Average Expense by Department')\n",
    "# plt.xlabel('Department')\n",
    "# plt.ylabel('Average Expense ($)')\n",
    "# plt.xticks(rotation=45, ha='right')\n",
    "# plt.tight_layout()\n",
    "# plt.show()\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'comparative',\n",
       " 'insight': \"The analysis could not be completed due to a missing 'department' column in the dataset\",\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'A bar chart was attempted but failed due to the missing department column in the dataset'},\n",
       " 'question': 'What is the average expense by department?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "    \"data_type\": \"comparative\",\n",
    "    \"insight\": \"The analysis could not be completed due to a missing 'department' column in the dataset\",\n",
    "    \"insight_value\": {},\n",
    "    \"plot\": {\n",
    "        \"description\": \"A bar chart was attempted but failed due to the missing department column in the dataset\"\n",
    "    },\n",
    "    \"question\": \"What is the average expense by department?\",\n",
    "    \"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 5:** How many expenses have been processed by each department?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Understanding the number of processed expenses per department provides insight into the activity levels and operational demands of each department."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# # Filter for processed expenses and group by department\n",
    "# processed_expenses_by_department = data[data['state'] == 'Processed'].groupby('department').size().sort_values(ascending=False)\n",
    "\n",
    "# # Plotting\n",
    "# plt.figure(figsize=(10, 6))\n",
    "# processed_expenses_by_department.plot(kind='bar', color='dodgerblue')\n",
    "# plt.title('Number of Processed Expenses by Department')\n",
    "# plt.xlabel('Department')\n",
    "# plt.ylabel('Number of Processed Expenses')\n",
    "# plt.xticks(rotation=45, ha='right')\n",
    "# plt.tight_layout()\n",
    "# plt.show()\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'comparative',\n",
       " 'insight': \"Analysis could not be completed due to missing 'department' column in the dataset\",\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': \"A bar chart was attempted but failed due to missing 'department' column in the dataset\"},\n",
       " 'question': 'How many expenses have been processed by each department?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "    \"data_type\": \"comparative\",\n",
    "    \"insight\": \"Analysis could not be completed due to missing 'department' column in the dataset\",\n",
    "    \"insight_value\": {},\n",
    "    \"plot\": {\n",
    "        \"description\": \"A bar chart was attempted but failed due to missing 'department' column in the dataset\"\n",
    "    },\n",
    "    \"question\": \"How many expenses have been processed by each department?\",\n",
    "    \"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summary of Findings (Flag 92):\n",
    "\n",
    "1. **Trend Analysis**: The analysis could not be done due to missing data.\n",
    "   \n",
    "2. **Employee Compliance Insight**: The study could not be conducted due to missing data.\n",
    "   \n",
    "3. **Departmental Insights**: The analysis could not be performed due to missing data."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
