{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cost Efficiency Goals Analysis (Flag 81)\n",
    "\n",
    "### Dataset Overview\n",
    "This dataset includes 500 simulated entries from the ServiceNow `sn_gf_goal` table, detailing various aspects of organizational goals. Key attributes include goal status, responsible parties, associated departments, start and end dates, and detailed descriptions. The dataset also contains metrics such as priority ranking, percentage completion, and target achievement rate. It primarily focuses on tracking the progress and management of both departmental and individual goals, offering insights into the effectiveness and alignment of these goals with broader organizational strategies. Additionally, the dataset logs updates to each goal, providing a record of modifications and the identities of those making these changes.\n",
    "\n",
    "### Your Objective\n",
    "**Objective**: Analyze the factors (whether time-based or categorical) that influence goal completion rates and evaluate their impact on overall departmental performance.\n",
    "\n",
    "**Role**: Enterprise Performance Analyst\n",
    "\n",
    "**Challenge Level**: 4 out of 5. The task requires a deep understanding of complex data patterns and advanced analytical skills to interpret the information and extract actionable insights.\n",
    "\n",
    "**Category**: Strategic Goal Management"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Necessary Libraries\n",
    "This cell imports all necessary libraries required for the analysis. This includes libraries for data manipulation, data visualization, and any specific utilities needed for the tasks. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import argparse\n",
    "import pandas as pd\n",
    "import json\n",
    "import requests\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "from pandas import date_range"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Dataset\n",
    "This cell loads the goals dataset used in the analysis. The dataset is assumed to be stored in a CSV file and is loaded into a DataFrame. This step includes reading the data from a file path and possibly performing initial observations such as viewing the first few rows to ensure it has loaded correctly.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>state</th>\n",
       "      <th>closed_at</th>\n",
       "      <th>opened_at</th>\n",
       "      <th>closed_by</th>\n",
       "      <th>number</th>\n",
       "      <th>sys_updated_by</th>\n",
       "      <th>location</th>\n",
       "      <th>assigned_to</th>\n",
       "      <th>caller_id</th>\n",
       "      <th>sys_updated_on</th>\n",
       "      <th>short_description</th>\n",
       "      <th>priority</th>\n",
       "      <th>assignement_group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Database</td>\n",
       "      <td>Closed</td>\n",
       "      <td>2023-07-25 03:32:18.462401146</td>\n",
       "      <td>2023-01-02 11:04:00</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>INC0000000034</td>\n",
       "      <td>admin</td>\n",
       "      <td>Australia</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>ITIL User</td>\n",
       "      <td>2023-07-06 03:31:13.838619495</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Database</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Hardware</td>\n",
       "      <td>Closed</td>\n",
       "      <td>2023-03-11 13:42:59.511508874</td>\n",
       "      <td>2023-01-03 10:19:00</td>\n",
       "      <td>Charlie Whitherspoon</td>\n",
       "      <td>INC0000000025</td>\n",
       "      <td>admin</td>\n",
       "      <td>India</td>\n",
       "      <td>Beth Anglin</td>\n",
       "      <td>Don Goodliffe</td>\n",
       "      <td>2023-05-19 04:22:50.443252112</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>1 - Critical</td>\n",
       "      <td>Hardware</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Database</td>\n",
       "      <td>Resolved</td>\n",
       "      <td>2023-01-20 14:37:18.361510788</td>\n",
       "      <td>2023-01-04 06:37:00</td>\n",
       "      <td>Charlie Whitherspoon</td>\n",
       "      <td>INC0000000354</td>\n",
       "      <td>system</td>\n",
       "      <td>India</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>ITIL User</td>\n",
       "      <td>2023-02-13 08:10:20.378839709</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Database</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Hardware</td>\n",
       "      <td>Resolved</td>\n",
       "      <td>2023-01-25 20:46:13.679914432</td>\n",
       "      <td>2023-01-04 06:53:00</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>INC0000000023</td>\n",
       "      <td>admin</td>\n",
       "      <td>Canada</td>\n",
       "      <td>Luke Wilson</td>\n",
       "      <td>Don Goodliffe</td>\n",
       "      <td>2023-06-14 11:45:24.784548040</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Hardware</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hardware</td>\n",
       "      <td>Closed</td>\n",
       "      <td>2023-05-10 22:35:58.881919516</td>\n",
       "      <td>2023-01-05 16:52:00</td>\n",
       "      <td>Luke Wilson</td>\n",
       "      <td>INC0000000459</td>\n",
       "      <td>employee</td>\n",
       "      <td>UK</td>\n",
       "      <td>Charlie Whitherspoon</td>\n",
       "      <td>David Loo</td>\n",
       "      <td>2023-06-11 20:25:35.094482408</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Hardware</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   category     state                      closed_at            opened_at  \\\n",
       "0  Database    Closed  2023-07-25 03:32:18.462401146  2023-01-02 11:04:00   \n",
       "1  Hardware    Closed  2023-03-11 13:42:59.511508874  2023-01-03 10:19:00   \n",
       "2  Database  Resolved  2023-01-20 14:37:18.361510788  2023-01-04 06:37:00   \n",
       "3  Hardware  Resolved  2023-01-25 20:46:13.679914432  2023-01-04 06:53:00   \n",
       "4  Hardware    Closed  2023-05-10 22:35:58.881919516  2023-01-05 16:52:00   \n",
       "\n",
       "              closed_by         number sys_updated_by   location  \\\n",
       "0            Fred Luddy  INC0000000034          admin  Australia   \n",
       "1  Charlie Whitherspoon  INC0000000025          admin      India   \n",
       "2  Charlie Whitherspoon  INC0000000354         system      India   \n",
       "3            Fred Luddy  INC0000000023          admin     Canada   \n",
       "4           Luke Wilson  INC0000000459       employee         UK   \n",
       "\n",
       "            assigned_to      caller_id                 sys_updated_on  \\\n",
       "0            Fred Luddy      ITIL User  2023-07-06 03:31:13.838619495   \n",
       "1           Beth Anglin  Don Goodliffe  2023-05-19 04:22:50.443252112   \n",
       "2            Fred Luddy      ITIL User  2023-02-13 08:10:20.378839709   \n",
       "3           Luke Wilson  Don Goodliffe  2023-06-14 11:45:24.784548040   \n",
       "4  Charlie Whitherspoon      David Loo  2023-06-11 20:25:35.094482408   \n",
       "\n",
       "    short_description      priority assignement_group  \n",
       "0  There was an issue      2 - High          Database  \n",
       "1  There was an issue  1 - Critical          Hardware  \n",
       "2  There was an issue      2 - High          Database  \n",
       "3  There was an issue      2 - High          Hardware  \n",
       "4  There was an issue      2 - High          Hardware  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset_path = \"csvs/flag-81.csv\"\n",
    "goal_data = pd.read_csv(dataset_path)\n",
    "df = pd.read_csv(dataset_path)\n",
    "goal_data.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### **Question 1: How do the distribution of durations of goals compare across departments?**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot comparison of goal durations across departments\n",
    "\n",
    "This cell plots a comprehensive look at how the durations of goals differ across various departments within the organization. Using a box plot, we compare the spread and central tendency of goal durations, marked by the median and mean durations for each department. This analysis is for identifying departments that may require process adjustments to align more closely with organizational goal completion times. Each department's performance is annotated directly on the plot, displaying both median and mean durations, which helps in quickly assessing and comparing departmental efficiency in achieving goals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# import matplotlib.pyplot as plt\n",
    "# import seaborn as sns\n",
    "# import pandas as pd\n",
    "# import numpy as np\n",
    "\n",
    "# # Assuming 'goal_data' is preloaded and contains 'Cost Reduction' category\n",
    "# goal_data['end_date'] = pd.to_datetime(goal_data['end_date'])\n",
    "# goal_data[\"start_date\"] = pd.to_datetime(goal_data[\"start_date\"])\n",
    "# # Calculate goal durations\n",
    "# goal_data['duration'] = (goal_data['end_date'] - goal_data['start_date']).dt.days\n",
    "\n",
    "# # Plotting\n",
    "# plt.figure(figsize=(12, 8))\n",
    "# box_plot = sns.boxplot(x='department', y='duration', data=goal_data, palette=\"Set3\")\n",
    "# plt.title('Comparison of Goal Durations by Department')\n",
    "# plt.xlabel('Department')\n",
    "# plt.ylabel('Goal Duration (days)')\n",
    "# plt.grid(True)\n",
    "\n",
    "# # Calculate median and mean for annotations\n",
    "# medians = goal_data.groupby(['department'])['duration'].median()\n",
    "# means = goal_data.groupby(['department'])['duration'].mean()\n",
    "\n",
    "# # Iterate over the departments to place the text annotations for median and mean\n",
    "# for xtick in box_plot.get_xticks():\n",
    "#     box_plot.text(xtick, medians[xtick] + 1, 'Median: {:.1f}'.format(medians[xtick]), \n",
    "#                   horizontalalignment='center', size='x-small', color='black', weight='semibold')\n",
    "#     box_plot.text(xtick, means[xtick] + 1, 'Mean: {:.1f}'.format(means[xtick]), \n",
    "#                   horizontalalignment='center', size='x-small', color='red', weight='semibold')\n",
    "\n",
    "# plt.show()\n",
    "\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'analytical',\n",
       " 'insight': 'There was no column end_date to conduct any analysis',\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'The graph could not be generated due to missing data'},\n",
       " 'question': \"How do the durations of 'Cost Reduction' goals in the Finance department compare to those in other departments?\",\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "\t\"data_type\": \"analytical\",\n",
    "\t\"insight\": \"There was no column end_date to conduct any analysis\",\n",
    "\t\"insight_value\": {\n",
    "\t},\n",
    "\t\"plot\": {\n",
    "    \t\"description\": \"The graph could not be generated due to missing data\",\n",
    "\t},\n",
    "\t\"question\": \"How do the durations of 'Cost Reduction' goals in the Finance department compare to those in other departments?\",\n",
    "\t\"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 2:** What is completion rates by different quarters?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot goal completion rates by quarter\n",
    "\n",
    "This box plot visualizes the completion rates of goals across different quarters, providing insights into the seasonal trends in goal achievement. By segmenting the data into quarters, we can identify any patterns or variations in goal completion rates over the year. This analysis helps in understanding the impact of seasonality on goal management and can guide resource allocation and goal setting strategies based on historical trends."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# # Convert start_date to datetime format\n",
    "# df['start_date'] = pd.to_datetime(df['start_date'])\n",
    "\n",
    "# # Extract the month and quarter from the start_date\n",
    "# df['month'] = df['start_date'].dt.month\n",
    "# df['quarter'] = df['start_date'].dt.quarter\n",
    "\n",
    "# # Visualize the trend of percent_complete by quarter\n",
    "# plt.figure(figsize=(12, 6))\n",
    "# sns.boxplot(x='quarter', y='percent_complete', data=df)\n",
    "# plt.title('Percent Complete by Quarter')\n",
    "# plt.xlabel('Quarter')\n",
    "# plt.ylabel('Percent Complete')\n",
    "# plt.grid(True, axis='y', linestyle='--', alpha=0.7)\n",
    "# plt.show()\n",
    "\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'Diagnostic',\n",
       " 'insight': 'There was no column start_date to conduct any analysis',\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'The graph could not be generated due to missing data'},\n",
       " 'question': 'How does the time of year (quarter) impact the completion rate of tasks?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "\t\"data_type\": \"Diagnostic\",\n",
    "\t\"insight\": \"There was no column start_date to conduct any analysis\",\n",
    "\t\"insight_value\": {\n",
    "\t},\n",
    "\t\"plot\": {\n",
    "    \t\"description\": \"The graph could not be generated due to missing data\",\n",
    "\t},\n",
    "\t\"question\": \"How does the time of year (quarter) impact the completion rate of tasks?\",\n",
    "\t\"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 3:** What is distribution of Goal types in Finance department?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot goal category distribution within the Finance department\n",
    "\n",
    "This pie chart illustrates the proportional distribution of different goal categories within the Finance department, highlighting the dominant categories and their respective shares. This visualization is crucial for understanding the strategic focus areas within the department, based on the types of goals they prioritize."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# import matplotlib.pyplot as plt\n",
    "\n",
    "# # Filter data for the Finance department\n",
    "# finance_goals = goal_data[goal_data['department'] == 'Finance']\n",
    "\n",
    "# # Count the occurrence of each category in the Finance department\n",
    "# category_counts = finance_goals['category'].value_counts()\n",
    "\n",
    "# # Create a pie chart\n",
    "# plt.figure(figsize=(10, 7))\n",
    "# plt.pie(category_counts, labels=category_counts.index, autopct='%1.1f%%', startangle=140)\n",
    "# plt.title('Distribution of Goal Categories in Finance Department')\n",
    "# plt.show()\n",
    "\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'diagnostic',\n",
       " 'insight': 'There was no column department to conduct any analysis',\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'The graph could not be generated due to missing data'},\n",
       " 'question': 'What is the distribution of Goal types and categories in the Finance department?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "\t\"data_type\": \"diagnostic\",\n",
    "\t\"insight\": \"There was no column department to conduct any analysis\",\n",
    "\t\"insight_value\": {\n",
    "\t},\n",
    "\t\"plot\": {\n",
    "    \t\"description\": \"The graph could not be generated due to missing data\",\n",
    "\t},\n",
    "\t\"question\": \"What is the distribution of Goal types and categories in the Finance department?\",\n",
    "\t\"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 4:** What is distribution of Goal priorities in Finance department?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot priority distribution within the Finance department\n",
    "\n",
    "This pie chart provides a clear visual representation of the distribution of goal priorities within the Finance department, emphasizing the prevalence of each priority level and aiding in the assessment of focus and urgency applied to different objectives."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# import matplotlib.pyplot as plt\n",
    "\n",
    "# # Filter data for the Finance department\n",
    "# finance_goals = goal_data[goal_data['department'] == 'Finance']\n",
    "\n",
    "# # Count the occurrence of each category in the Finance department\n",
    "# category_counts = finance_goals['priority'].value_counts()\n",
    "\n",
    "# # Create a pie chart\n",
    "# plt.figure(figsize=(10, 7))\n",
    "# plt.pie(category_counts, labels=category_counts.index, autopct='%1.1f%%', startangle=140)\n",
    "# plt.title('Distribution of Goal priorities in Finance Department')\n",
    "# plt.show()\n",
    "\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'diagnostic',\n",
       " 'insight': 'There was no column department to conduct any analysis',\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'The graph could not be generated due to missing data'},\n",
       " 'question': 'What is the distribution of Goal priorities in the Finance department?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "\t\"data_type\": \"diagnostic\",\n",
    "\t\"insight\": \"There was no column department to conduct any analysis\",\n",
    "\t\"insight_value\": {\n",
    "\t},\n",
    "\t\"plot\": {\n",
    "    \t\"description\": \"The graph could not be generated due to missing data\",\n",
    "\t},\n",
    "\t\"question\": \"What is the distribution of Goal priorities in the Finance department?\",\n",
    "\t\"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summary of Findings (Flag 81)\n",
    "\n",
    "1. **Comparison of Goal Durations in Finance vs. Other Departments**: The absence of the `end_date` column prevents the analysis of how the durations of 'Cost Reduction' goals in the Finance department compare to those in other departments, limiting insights into departmental efficiency.\n",
    "\n",
    "2. **Impact of Time of Year on Task Completion Rates**: Without the `start_date` column, it is impossible to analyze how the completion rates of tasks vary by quarter, preventing an understanding of any seasonal trends in task performance.\n",
    "\n",
    "3. **Distribution of Goal Types and Categories in Finance**: The lack of the `department` column restricts the ability to analyze the distribution of goal types and categories specifically within the Finance department, hindering targeted performance analysis.\n",
    "\n",
    "4. **Distribution of Goal Priorities in Finance**: The absence of the `department` column also limits the ability to assess the distribution of goal priorities in the Finance department, making it difficult to identify focus areas and potential improvement strategies."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
