{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e465d5d6",
   "metadata": {},
   "source": [
    "## Expense Claim Patterns and Fraud Analysis (Flag 89)\n",
    "\n",
    "### Dataset Description\n",
    "The dataset consists of 500 entries simulating the ServiceNow fm_expense_line table, which records various attributes of financial expenses. This table documents the flow of financial transactions by detailing the amount, departmental allocation, and the nature of each expense. It provides a comprehensive view of organizational expenditures across different categories, highlighting both the timing and the approval state of each financial entry. Additionally, the dataset offers insights into the efficiency of expense processing based on different states, revealing potential areas for workflow optimization.\n",
    "\n",
    "### Your Task\n",
    "**Goal**: To detect and investigate instances of repeated identical expense claims by individual users, determining whether these repetitions are fraudulent or due to misunderstandings of the expense policy.\n",
    "\n",
    "**Role**: Compliance and Audit Analyst\n",
    "\n",
    "**Difficulty**: 3 out of 5.\n",
    "\n",
    "**Category**: Finance Management\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f445e484",
   "metadata": {},
   "source": [
    "### Import necessary libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "59e8f867",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3beeb6d",
   "metadata": {},
   "source": [
    "Load the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2f74bf2f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load the dataset\n",
    "combined_file_path = 'csvs/flag-89.csv'\n",
    "data = pd.read_csv(combined_file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c270a6f2",
   "metadata": {},
   "source": [
    "### **Question 1:** What are the differences in processing times for expenses in various states such as Processed, Declined, Submitted, and Pending?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26fea6d0",
   "metadata": {},
   "source": [
    "Analyzing the processing times for expenses in different states reveals notable differences. Processed expenses tend to have shorter processing times compared to Declined expenses. Understanding these differences helps identify areas for potential optimization and efficiency improvements in the expense processing workflow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e23224e2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# # Calculate average processing time for each state\n",
    "# avg_processing_time_by_state = data.groupby('state')['processing_time_hours'].mean().reset_index()\n",
    "\n",
    "# # Set the style of the visualization\n",
    "# sns.set(style=\"whitegrid\")\n",
    "\n",
    "# # Create a bar plot for average processing time by state\n",
    "# plt.figure(figsize=(12, 6))\n",
    "# sns.barplot(x='state', y='processing_time_hours', data=avg_processing_time_by_state)\n",
    "# plt.title('Average Processing Time by State')\n",
    "# plt.xlabel('State')\n",
    "# plt.ylabel('Average Processing Time (hours)')\n",
    "# plt.xticks(rotation=45)\n",
    "# plt.show()\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "fd899662",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'comparative',\n",
       " 'insight': \"The analysis could not be completed because the 'processing_time_hours' column was not found in the dataset, indicating either missing or incorrectly named data\",\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': \"A bar plot was attempted to compare average processing times across different states, but failed due to missing column 'processing_time_hours'\"},\n",
       " 'question': 'What are the differences in processing times for expenses in various states such as Processed, Declined, Submitted, and Pending?',\n",
       " 'actionable_insight': 'No actionable insight can be provided as the analysis could not be completed due to missing data'}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "    \"data_type\": \"comparative\",\n",
    "    \"insight\": \"The analysis could not be completed because the 'processing_time_hours' column was not found in the dataset, indicating either missing or incorrectly named data\",\n",
    "    \"insight_value\": {},\n",
    "    \"plot\": {\n",
    "        \"description\": \"A bar plot was attempted to compare average processing times across different states, but failed due to missing column 'processing_time_hours'\"\n",
    "    },\n",
    "    \"question\": \"What are the differences in processing times for expenses in various states such as Processed, Declined, Submitted, and Pending?\",\n",
    "    \"actionable_insight\": \"No actionable insight can be provided as the analysis could not be completed due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3ff2a3a",
   "metadata": {},
   "source": [
    "## Question 2: How do specific keywords in the short descriptions of expense reports influence the amount of these expenses?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2467c3e",
   "metadata": {},
   "source": [
    "## Description\n",
    "Analyzing the expense amounts reveals that certain keywords in the short descriptions, such as 'Travel', 'Service', 'Cloud', 'Asset', and others, are associated with varying expense values. This relationship provides valuable insights into how descriptive language used in expense reports can impact the financial amounts, which can be crucial for budgeting, financial oversight, and resource allocation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8a044a62",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# # Function to categorize descriptions based on keywords\n",
    "# def categorize_description(description):\n",
    "#     keywords = {\"Travel\": 1.5, \"Service\": 1.2, \"Cloud\": 1.3, \"Asset\": 0.8, \"Equipment\": 0.9}\n",
    "#     for keyword in keywords.keys():\n",
    "#         if pd.notnull(description) and keyword in description:\n",
    "#             return keyword\n",
    "#     return 'Other'\n",
    "\n",
    "# # Apply the function to create a new column for categories\n",
    "# data['description_category'] = data['short_description'].apply(categorize_description)\n",
    "\n",
    "# # Set the style of the visualization\n",
    "# sns.set(style=\"whitegrid\")\n",
    "\n",
    "# # Create a boxplot for amount by description category\n",
    "# plt.figure(figsize=(12, 6))\n",
    "# sns.boxplot(x='description_category', y='amount', data=data)\n",
    "# plt.title('Amount Distribution by Short Description Category')\n",
    "# plt.xlabel('Short Description Category')\n",
    "# plt.ylabel('Amount')\n",
    "# plt.xticks(rotation=45)\n",
    "# plt.show()\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "67829b09",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'comparative',\n",
       " 'insight': \"The analysis could not be completed due to a ValueError indicating that the 'amount' column is not present in the dataset\",\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': \"A boxplot was attempted to show the distribution of expense amounts across different description categories, but failed due to missing 'amount' column in the data\"},\n",
       " 'question': 'How do specific keywords in the short descriptions of expense reports influence the amount of these expenses?',\n",
       " 'actionable_insight': 'No actionable insight can be provided as the analysis could not be completed due to missing data'}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "    \"data_type\": \"comparative\",\n",
    "    \"insight\": \"The analysis could not be completed due to a ValueError indicating that the 'amount' column is not present in the dataset\",\n",
    "    \"insight_value\": {},\n",
    "    \"plot\": {\n",
    "        \"description\": \"A boxplot was attempted to show the distribution of expense amounts across different description categories, but failed due to missing 'amount' column in the data\"\n",
    "    },\n",
    "    \"question\": \"How do specific keywords in the short descriptions of expense reports influence the amount of these expenses?\",\n",
    "    \"actionable_insight\": \"No actionable insight can be provided as the analysis could not be completed due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00c60df0",
   "metadata": {},
   "source": [
    "### **Question 3:** What are the expense patterns for different departments in terms of average amounts?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f3df593",
   "metadata": {},
   "source": [
    "By examining the average expense amounts across different departments, we can uncover departmental spending patterns. This can help in understanding which departments have higher or lower average expenses, providing insights for budgeting and resource allocation decisions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "614fa908",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# # Calculate average amount for each department\n",
    "# avg_amount_by_department = data.groupby('department')['amount'].mean().reset_index()\n",
    "\n",
    "# # Set the style of the visualization\n",
    "# sns.set(style=\"whitegrid\")\n",
    "\n",
    "# # Create a bar plot for average amount by department\n",
    "# plt.figure(figsize=(12, 6))\n",
    "# sns.barplot(x='department', y='amount', data=avg_amount_by_department)\n",
    "# plt.title('Average Amount by Department')\n",
    "# plt.xlabel('Department')\n",
    "# plt.ylabel('Average Amount')\n",
    "# plt.xticks(rotation=45)\n",
    "# plt.show()\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8ffab682",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'comparative',\n",
       " 'insight': \"Analysis could not be performed due to missing 'department' column in the dataset\",\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': \"Bar plot could not be generated due to KeyError indicating missing 'department' column\"},\n",
       " 'question': 'What are the expense patterns for different departments in terms of average amounts?',\n",
       " 'actionable_insight': 'No actionable insight can be provided as the analysis could not be completed due to missing data'}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "    \"data_type\": \"comparative\",\n",
    "    \"insight\": \"Analysis could not be performed due to missing 'department' column in the dataset\",\n",
    "    \"insight_value\": {},\n",
    "    \"plot\": {\n",
    "        \"description\": \"Bar plot could not be generated due to KeyError indicating missing 'department' column\"\n",
    "    },\n",
    "    \"question\": \"What are the expense patterns for different departments in terms of average amounts?\",\n",
    "    \"actionable_insight\": \"No actionable insight can be provided as the analysis could not be completed due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3eec8fe7",
   "metadata": {},
   "source": [
    "### **Question 4:** How does the number of expense reports submitted vary by user?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c08e9ad6",
   "metadata": {},
   "source": [
    "Analyzing the number of expense reports submitted by different users can help identify the most active users in terms of expense submissions. This insight can aid in understanding user behavior and identifying potential areas for fraud detection or efficiency improvements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9164e512",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# # Calculate the number of expense reports submitted by each user\n",
    "# expense_reports_by_user = data['user'].value_counts().reset_index()\n",
    "# expense_reports_by_user.columns = ['user', 'number_of_reports']\n",
    "\n",
    "# # Set the style of the visualization\n",
    "# sns.set(style=\"whitegrid\")\n",
    "\n",
    "# # Create a bar plot for the number of expense reports by user\n",
    "# plt.figure(figsize=(12, 6))\n",
    "# sns.barplot(x='user', y='number_of_reports', data=expense_reports_by_user)\n",
    "# plt.title('Number of Expense Reports by User')\n",
    "# plt.xlabel('User')\n",
    "# plt.ylabel('Number of Expense Reports')\n",
    "# plt.xticks(rotation=90)\n",
    "# plt.show()\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9c3290a5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'comparative',\n",
       " 'insight': \"Analysis could not be performed because the 'user' column is not present in the dataset\",\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': \"A bar plot was attempted but failed due to missing 'user' column in the dataset\"},\n",
       " 'question': 'How does the number of expense reports submitted vary by user?',\n",
       " 'actionable_insight': 'No actionable insight can be provided as the analysis could not be completed due to missing data'}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "    \"data_type\": \"comparative\",\n",
    "    \"insight\": \"Analysis could not be performed because the 'user' column is not present in the dataset\",\n",
    "    \"insight_value\": {},\n",
    "    \"plot\": {\n",
    "        \"description\": \"A bar plot was attempted but failed due to missing 'user' column in the dataset\"\n",
    "    },\n",
    "    \"question\": \"How does the number of expense reports submitted vary by user?\",\n",
    "    \"actionable_insight\": \"No actionable insight can be provided as the analysis could not be completed due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ef1a095",
   "metadata": {},
   "source": [
    "### Summary of Findings (Flag 89):\n",
    "\n",
    "1. **State-Based Processing Time Analysis**: The analysis could not be completed because the 'processing_time_hours' column was not found in the dataset, indicating either missing or incorrectly named data.\n",
    "\n",
    "2. **Impact of Keywords in Short Descriptions on Expense Amounts**: The analysis could not be completed due to a ValueError indicating that the 'amount' column is not present in the dataset. However, exploring the relationship between specific keywords in expense descriptions and the associated expense amounts can provide valuable insights into expense reporting practices and financial decision-making.\n",
    "\n",
    "3. **Departmental Expense Patterns**: Analysis could not be performed due to missing 'department' column in the dataset. Understanding departmental spending patterns can help in budgeting and resource allocation decisions.\n",
    "\n",
    "4. **User Expense Report Submission Patterns**: Analysis could not be performed because the 'user' column is not present in the dataset. Analyzing the number of expense reports submitted by different users can help identify the most active users in terms of expense submissions, aiding in understanding user behavior and identifying potential areas for fraud detection or efficiency improvements."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
