{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Departmental Objective Monitoring and Evaluation (Flag 80)\n",
    "\n",
    "### Dataset Overview\n",
    "This dataset includes 500 entries simulating the ServiceNow `sn_gf_goal` table, which encompasses various attributes related to organizational objectives. These attributes cover goal status, responsible individuals, departmental association, start and end periods, and detailed descriptions. The dataset also includes metrics such as priority level, completion percentage, and target achievement rate. The primary focus is on monitoring and managing both departmental and individual objectives, providing insights into the effectiveness of these goals and their alignment with overall organizational strategies. Additionally, the dataset records updates to each objective, offering a historical log of changes and the identities of those responsible for these updates.\n",
    "\n",
    "### Your Objective\n",
    "**Objective**: Examine the extended timelines of 'Cost Efficiency' objectives within the Finance department to identify root causes and propose strategies for improving the management of these objectives.\n",
    "\n",
    "**Role**: Strategic Objectives Analyst\n",
    "\n",
    "**Challenge Level**: 4 out of 5. This task requires a strong ability to analyze complex data patterns.\n",
    "\n",
    "**Category**: Goal Management"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Import Necessary Libraries\n",
    "This cell imports all necessary libraries required for the analysis. This includes libraries for data manipulation, data visualization, and any specific utilities needed for the tasks. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import argparse\n",
    "import pandas as pd\n",
    "import json\n",
    "import requests\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "from pandas import date_range"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Dataset\n",
    "This cell loads the goals dataset used in the analysis. The dataset is assumed to be stored in a CSV file and is loaded into a DataFrame. This step includes reading the data from a file path and possibly performing initial observations such as viewing the first few rows to ensure it has loaded correctly.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>state</th>\n",
       "      <th>closed_at</th>\n",
       "      <th>opened_at</th>\n",
       "      <th>closed_by</th>\n",
       "      <th>number</th>\n",
       "      <th>sys_updated_by</th>\n",
       "      <th>location</th>\n",
       "      <th>assigned_to</th>\n",
       "      <th>caller_id</th>\n",
       "      <th>sys_updated_on</th>\n",
       "      <th>short_description</th>\n",
       "      <th>priority</th>\n",
       "      <th>assignement_group</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Database</td>\n",
       "      <td>Closed</td>\n",
       "      <td>2023-07-25 03:32:18.462401146</td>\n",
       "      <td>2023-01-02 11:04:00</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>INC0000000034</td>\n",
       "      <td>admin</td>\n",
       "      <td>Australia</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>ITIL User</td>\n",
       "      <td>2023-07-06 03:31:13.838619495</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Database</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Hardware</td>\n",
       "      <td>Closed</td>\n",
       "      <td>2023-03-11 13:42:59.511508874</td>\n",
       "      <td>2023-01-03 10:19:00</td>\n",
       "      <td>Charlie Whitherspoon</td>\n",
       "      <td>INC0000000025</td>\n",
       "      <td>admin</td>\n",
       "      <td>India</td>\n",
       "      <td>Beth Anglin</td>\n",
       "      <td>Don Goodliffe</td>\n",
       "      <td>2023-05-19 04:22:50.443252112</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>1 - Critical</td>\n",
       "      <td>Hardware</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Database</td>\n",
       "      <td>Resolved</td>\n",
       "      <td>2023-01-20 14:37:18.361510788</td>\n",
       "      <td>2023-01-04 06:37:00</td>\n",
       "      <td>Charlie Whitherspoon</td>\n",
       "      <td>INC0000000354</td>\n",
       "      <td>system</td>\n",
       "      <td>India</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>ITIL User</td>\n",
       "      <td>2023-02-13 08:10:20.378839709</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Database</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Hardware</td>\n",
       "      <td>Resolved</td>\n",
       "      <td>2023-01-25 20:46:13.679914432</td>\n",
       "      <td>2023-01-04 06:53:00</td>\n",
       "      <td>Fred Luddy</td>\n",
       "      <td>INC0000000023</td>\n",
       "      <td>admin</td>\n",
       "      <td>Canada</td>\n",
       "      <td>Luke Wilson</td>\n",
       "      <td>Don Goodliffe</td>\n",
       "      <td>2023-06-14 11:45:24.784548040</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Hardware</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Hardware</td>\n",
       "      <td>Closed</td>\n",
       "      <td>2023-05-10 22:35:58.881919516</td>\n",
       "      <td>2023-01-05 16:52:00</td>\n",
       "      <td>Luke Wilson</td>\n",
       "      <td>INC0000000459</td>\n",
       "      <td>employee</td>\n",
       "      <td>UK</td>\n",
       "      <td>Charlie Whitherspoon</td>\n",
       "      <td>David Loo</td>\n",
       "      <td>2023-06-11 20:25:35.094482408</td>\n",
       "      <td>There was an issue</td>\n",
       "      <td>2 - High</td>\n",
       "      <td>Hardware</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   category     state                      closed_at            opened_at  \\\n",
       "0  Database    Closed  2023-07-25 03:32:18.462401146  2023-01-02 11:04:00   \n",
       "1  Hardware    Closed  2023-03-11 13:42:59.511508874  2023-01-03 10:19:00   \n",
       "2  Database  Resolved  2023-01-20 14:37:18.361510788  2023-01-04 06:37:00   \n",
       "3  Hardware  Resolved  2023-01-25 20:46:13.679914432  2023-01-04 06:53:00   \n",
       "4  Hardware    Closed  2023-05-10 22:35:58.881919516  2023-01-05 16:52:00   \n",
       "\n",
       "              closed_by         number sys_updated_by   location  \\\n",
       "0            Fred Luddy  INC0000000034          admin  Australia   \n",
       "1  Charlie Whitherspoon  INC0000000025          admin      India   \n",
       "2  Charlie Whitherspoon  INC0000000354         system      India   \n",
       "3            Fred Luddy  INC0000000023          admin     Canada   \n",
       "4           Luke Wilson  INC0000000459       employee         UK   \n",
       "\n",
       "            assigned_to      caller_id                 sys_updated_on  \\\n",
       "0            Fred Luddy      ITIL User  2023-07-06 03:31:13.838619495   \n",
       "1           Beth Anglin  Don Goodliffe  2023-05-19 04:22:50.443252112   \n",
       "2            Fred Luddy      ITIL User  2023-02-13 08:10:20.378839709   \n",
       "3           Luke Wilson  Don Goodliffe  2023-06-14 11:45:24.784548040   \n",
       "4  Charlie Whitherspoon      David Loo  2023-06-11 20:25:35.094482408   \n",
       "\n",
       "    short_description      priority assignement_group  \n",
       "0  There was an issue      2 - High          Database  \n",
       "1  There was an issue  1 - Critical          Hardware  \n",
       "2  There was an issue      2 - High          Database  \n",
       "3  There was an issue      2 - High          Hardware  \n",
       "4  There was an issue      2 - High          Hardware  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset_path = \"csvs/flag-80.csv\"\n",
    "goal_data = pd.read_csv(dataset_path)\n",
    "df = pd.read_csv(dataset_path)\n",
    "goal_data.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 1: How do the distribution of durations of goals compare across departments?**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot goal durations across departments\n",
    "\n",
    "This visualization shows distribution of goal durations across various departments, highlighting median and mean durations to compare departmental efficiency. It emphasizes the variances and typical goal completion timelines, providing a strategic overview of departmental performance in goal management."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# import matplotlib.pyplot as plt\n",
    "# import seaborn as sns\n",
    "# import pandas as pd\n",
    "# import numpy as np\n",
    "\n",
    "# # Assuming 'goal_data' is preloaded and contains 'Cost Reduction' category\n",
    "# goal_data['end_date'] = pd.to_datetime(goal_data['end_date'])\n",
    "# goal_data[\"start_date\"] = pd.to_datetime(goal_data[\"start_date\"])\n",
    "# # Calculate goal durations\n",
    "# goal_data['duration'] = (goal_data['end_date'] - goal_data['start_date']).dt.days\n",
    "\n",
    "# # Plotting\n",
    "# plt.figure(figsize=(12, 8))\n",
    "# box_plot = sns.boxplot(x='department', y='duration', data=goal_data, palette=\"Set3\")\n",
    "# plt.title('Comparison of Goal Durations by Department')\n",
    "# plt.xlabel('Department')\n",
    "# plt.ylabel('Goal Duration (days)')\n",
    "# plt.grid(True)\n",
    "\n",
    "# # Calculate median and mean for annotations\n",
    "# medians = goal_data.groupby(['department'])['duration'].median()\n",
    "# means = goal_data.groupby(['department'])['duration'].mean()\n",
    "\n",
    "# # Iterate over the departments to place the text annotations for median and mean\n",
    "# for xtick in box_plot.get_xticks():\n",
    "#     box_plot.text(xtick, medians[xtick] + 1, 'Median: {:.1f}'.format(medians[xtick]), \n",
    "#                   horizontalalignment='center', size='x-small', color='black', weight='semibold')\n",
    "#     box_plot.text(xtick, means[xtick] + 1, 'Mean: {:.1f}'.format(means[xtick]), \n",
    "#                   horizontalalignment='center', size='x-small', color='red', weight='semibold')\n",
    "\n",
    "# plt.show()\n",
    "\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'descriptive',\n",
       " 'insight': 'There was no column end_date to conduct any analysis',\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'The graph could not be generated due to missing data'},\n",
       " 'question': 'How do the distribution of durations of goals compare across departments?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "\t\"data_type\": \"descriptive\",\n",
    "\t\"insight\": \"There was no column end_date to conduct any analysis\",\n",
    "\t\"insight_value\": {\n",
    "\t},\n",
    "\t\"plot\": {\n",
    "    \t\"description\": \"The graph could not be generated due to missing data\",\n",
    "\t},\n",
    "\t\"question\": \"How do the distribution of durations of goals compare across departments?\",\n",
    "\t\"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 2:** What is distribution of Goal categories in Finance department?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot distribution of goal categories within the Finance department\n",
    "\n",
    "This pie chart illustrates the proportion of different goal categories within the Finance department, revealing the predominance of specific goals and highlighting departmental focus areas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# import matplotlib.pyplot as plt\n",
    "\n",
    "# # Filter data for the Finance department\n",
    "# finance_goals = goal_data[goal_data['department'] == 'Finance']\n",
    "\n",
    "# # Count the occurrence of each category in the Finance department\n",
    "# category_counts = finance_goals['category'].value_counts()\n",
    "\n",
    "# # Create a pie chart\n",
    "# plt.figure(figsize=(10, 7))\n",
    "# plt.pie(category_counts, labels=category_counts.index, autopct='%1.1f%%', startangle=140)\n",
    "# plt.title('Distribution of Goal Categories in Finance Department')\n",
    "# plt.show()\n",
    "\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'descriptive',\n",
       " 'insight': 'There was no column department to conduct any analysis',\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'The graph could not be generated due to missing data'},\n",
       " 'question': 'What is the distribution of Goal categories in the Finance department?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "\t\"data_type\": \"descriptive\",\n",
    "\t\"insight\": \"There was no column department to conduct any analysis\",\n",
    "\t\"insight_value\": {\n",
    "\t},\n",
    "\t\"plot\": {\n",
    "    \t\"description\": \"The graph could not be generated due to missing data\",\n",
    "\t},\n",
    "\t\"question\": \"What is the distribution of Goal categories in the Finance department?\",\n",
    "\t\"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 3:** What is the distribution of Goal distribution by category across all departments?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot the goal duration comparison by category across departments\n",
    "\n",
    "This box plot visually compares goal durations across different categories for all departments, annotated with mean durations to highlight trends and outliers in goal completion times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# import matplotlib.pyplot as plt\n",
    "# import seaborn as sns\n",
    "\n",
    "# # Calculate goal durations in days\n",
    "# goal_data['duration'] = (goal_data['end_date'] - goal_data['start_date']).dt.days\n",
    "\n",
    "\n",
    "# # Plotting\n",
    "# plt.figure(figsize=(14, 8))\n",
    "# box_plot = sns.boxplot(x='category', y='duration', data=goal_data)\n",
    "# plt.title('Comparison of Goal Duration by Category Across All Departments')\n",
    "# plt.xlabel('Goal Category')\n",
    "# plt.ylabel('Duration (days)')\n",
    "# plt.xticks(rotation=45)  # Rotate category names for better readability\n",
    "# plt.grid(True)\n",
    "\n",
    "# # Calculate median and mean for annotations\n",
    "# medians = goal_data.groupby(['category'])['duration'].median()\n",
    "# means = goal_data.groupby(['category'])['duration'].mean()\n",
    "\n",
    "# # Iterate over the departments to place the text annotations for median and mean\n",
    "# for xtick in box_plot.get_xticks():\n",
    "#     box_plot.text(xtick, means[xtick] + 1, 'Mean: {:.1f}'.format(means[xtick]), \n",
    "#                   horizontalalignment='center', size='x-small', color='red', weight='semibold')\n",
    "\n",
    "\n",
    "# plt.show()\n",
    "\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'diagnostic',\n",
       " 'insight': 'There was no column end_date to conduct any analysis',\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'The graph could not be generated due to missing data'},\n",
       " 'question': 'What is the distribution of Goal durations by category across all departments?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "\t\"data_type\": \"diagnostic\",\n",
    "\t\"insight\": \"There was no column end_date to conduct any analysis\",\n",
    "\t\"insight_value\": {\n",
    "\t},\n",
    "\t\"plot\": {\n",
    "    \t\"description\": \"The graph could not be generated due to missing data\",\n",
    "\t},\n",
    "\t\"question\": \"What is the distribution of Goal durations by category across all departments?\",\n",
    "\t\"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 4:** How do specific keywords in task descriptions affect their target percentages and completion rates?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Analysis of Target Percentage by Presence of Keywords in Description\n",
    "\n",
    "This box plot displays the distribution of target percentages for items with and without keywords in their descriptions. The plot compares two groups: \"No Keywords\" and \"Has Keywords.\" The median target percentage is slightly higher in the group with keywords (approximately 78.5%) compared to the group without keywords (around 75%). This visualization highlights potential differences in target achievement linked to the presence of descriptive keywords, which may indicate the impact of keyword usage on performance or goal alignment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# plt.figure(figsize=(10, 6))\n",
    "# boxplot = sns.boxplot(x='contains_keywords', y='target_percentage', data=df, showmeans=True,\n",
    "#                       meanprops={\"marker\":\"o\", \"markerfacecolor\":\"red\", \"markersize\":\"10\"},\n",
    "#                     #   medianprops={\"color\": \"blue\", \"linewidth\": 2},\n",
    "#                       whiskerprops={\"linewidth\": 2},\n",
    "#                       capprops={\"linewidth\": 2})\n",
    "\n",
    "# # Annotate the boxplot with the mean and median values\n",
    "# for i in range(2):\n",
    "#     group_data = df[df['contains_keywords'] == i]['target_percentage']\n",
    "#     mean_val = group_data.mean()\n",
    "#     median_val = group_data.median()\n",
    "    \n",
    "#     plt.text(i, mean_val, f'{mean_val:.2f}', color='red', ha='center', va='bottom')\n",
    "#     # plt.text(i, median_val, f'{median_val:.2f}', color='blue', ha='center', va='bottom')\n",
    "\n",
    "# plt.title('Target Percentage by Presence of Keywords in Description')\n",
    "# plt.xlabel('Contains Keywords')\n",
    "# plt.ylabel('Target Percentage')\n",
    "# plt.xticks([0, 1], ['No Keywords', 'Has Keywords'])\n",
    "# plt.grid(True, axis='y', linestyle='--', alpha=0.7)\n",
    "# plt.show()\n",
    "\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'keyword impact analysis',\n",
       " 'insight': 'There was no column contains_keywords to conduct any analysis',\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'The graph could not be generated due to missing data'},\n",
       " 'question': 'How do specific keywords in task descriptions affect their target percentages and completion rates?',\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "\t\"data_type\": \"keyword impact analysis\",\n",
    "\t\"insight\": \"There was no column contains_keywords to conduct any analysis\",\n",
    "\t\"insight_value\": {\n",
    "\t},\n",
    "\t\"plot\": {\n",
    "    \t\"description\": \"The graph could not be generated due to missing data\",\n",
    "\t},\n",
    "\t\"question\": \"How do specific keywords in task descriptions affect their target percentages and completion rates?\",\n",
    "\t\"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### **Question 5:** What are the potential future trends in the duration of 'Cost Reduction' goals across all departments if current operational and strategic practices remain unchanged?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot future trend predictions of Cost Reduction goal durations\n",
    "\n",
    "This plot projects future trends in the durations of 'Cost Reduction' goals across all departments, assuming no change in current operational practices. The scatter plot provides historical data points, while the green dashed line forecasts potential future durations based on linear regression analysis. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "N/A\n"
     ]
    }
   ],
   "source": [
    "# import matplotlib.pyplot as plt\n",
    "# import seaborn as sns\n",
    "# import pandas as pd\n",
    "# import numpy as np\n",
    "# from sklearn.linear_model import LinearRegression\n",
    "\n",
    "# # Assuming 'goal_data' is preloaded and contains the relevant data for 'Cost Reduction' category\n",
    "# cost_reduction_goals = goal_data[goal_data['category'] == 'Cost Reduction']\n",
    "\n",
    "# # Convert start_date to a numeric value for regression (number of days since the first date)\n",
    "# cost_reduction_goals['start_date_numeric'] = (cost_reduction_goals['start_date'] - cost_reduction_goals['start_date'].min()).dt.days\n",
    "\n",
    "# # Calculate durations\n",
    "# cost_reduction_goals['duration'] = (cost_reduction_goals['end_date'] - cost_reduction_goals['start_date']).dt.days\n",
    "\n",
    "# # Prepare data for regression model\n",
    "# X = cost_reduction_goals[['start_date_numeric']]  # Features\n",
    "# y = cost_reduction_goals['duration']  # Target\n",
    "\n",
    "# # Fit the regression model\n",
    "# model = LinearRegression()\n",
    "# model.fit(X, y)\n",
    "\n",
    "# # Predict future durations\n",
    "# # Extend the date range by, say, 20% more time into the future for forecasting\n",
    "# future_dates = np.arange(X['start_date_numeric'].max() + 1, X['start_date_numeric'].max() * 1.2, dtype=int).reshape(-1, 1)\n",
    "# future_predictions = model.predict(future_dates)\n",
    "\n",
    "# # Plotting\n",
    "# plt.figure(figsize=(12, 8))\n",
    "# # Scatter plot for existing data\n",
    "# sns.scatterplot(x='start_date_numeric', y='duration', data=cost_reduction_goals, color='blue', label='Actual Durations')\n",
    "# # Regression line for existing data\n",
    "# sns.regplot(x='start_date_numeric', y='duration', data=cost_reduction_goals, scatter=False, color='red', label='Trend Line')\n",
    "# # Plot for future predictions\n",
    "# plt.plot(future_dates.flatten(), future_predictions, 'g--', label='Future Trend')\n",
    "# # Convert numeric dates back to actual dates for labeling on x-axis\n",
    "# actual_dates = pd.date_range(start=cost_reduction_goals['start_date'].min(), periods=int(1.2 * X['start_date_numeric'].max()), freq='D')\n",
    "# plt.xticks(ticks=range(0, int(1.2 * X['start_date_numeric'].max()), 50), labels=[date.strftime('%Y-%m-%d') for date in actual_dates[::50]], rotation=45)\n",
    "# plt.title('Future Trends in the Duration of \\'Cost Reduction\\' Goals')\n",
    "# plt.xlabel('Start Date')\n",
    "# plt.ylabel('Duration (days)')\n",
    "# plt.legend()\n",
    "# plt.grid(True)\n",
    "# plt.show()\n",
    "\n",
    "print(\"N/A\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Generate JSON Description for the Insight"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'data_type': 'predictive',\n",
       " 'insight': 'There was no column start_date to conduct any analysis',\n",
       " 'insight_value': {},\n",
       " 'plot': {'description': 'The graph could not be generated due to missing data'},\n",
       " 'question': \"What are the potential future trends in the duration of 'Cost Reduction' goals across all departments if current operational and strategic practices remain unchanged?\",\n",
       " 'actionable_insight': 'No actionable insight could be generated due to missing data'}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{\n",
    "\t\"data_type\": \"predictive\",\n",
    "\t\"insight\": \"There was no column start_date to conduct any analysis\",\n",
    "\t\"insight_value\": {\n",
    "\t},\n",
    "\t\"plot\": {\n",
    "    \t\"description\": \"The graph could not be generated due to missing data\",\n",
    "\t},\n",
    "\t\"question\": \"What are the potential future trends in the duration of 'Cost Reduction' goals across all departments if current operational and strategic practices remain unchanged?\",\n",
    "\t\"actionable_insight\": \"No actionable insight could be generated due to missing data\"\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summary of Findings (Flag 80)\n",
    "\n",
    "1. **Lack of Data on Goal Durations**: The absence of the `end_date` column prevents analysis of how goal durations are distributed across departments, limiting insights into departmental efficiency in meeting their goals.\n",
    "\n",
    "2. **Distribution of Goal Categories in Finance**: Without the `department` column, it is impossible to assess the distribution of goal categories specifically within the Finance department, hindering targeted analysis of departmental performance.\n",
    "\n",
    "3. **Goal Durations by Category**: The missing `end_date` column also restricts the ability to analyze the distribution of goal durations by category across all departments, preventing an understanding of how different categories perform over time.\n",
    "\n",
    "4. **Impact of Keywords on Task Performance**: The lack of the `contains_keywords` column inhibits analysis on how specific keywords in task descriptions might influence target percentages and completion rates, making it difficult to determine effective practices in task management.\n",
    "\n",
    "5. **Predicting Future Trends in Goal Duration**: The absence of the `start_date` column limits the ability to predict future trends in the duration of 'Cost Reduction' goals across departments, making it challenging to strategize based on past performance."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "base",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
