{
    "dataset_csv_path": "data/notebooks/csvs/flag-95.csv",
    "user_dataset_csv_path": null,
    "metadata": {
        "goal": "Find the discrepancy and imbalance in distribution of incidents assigned across categories",
        "role": "L2 Support Agent",
        "category": "Incidents Management",
        "dataset_description": "The dataset comprises 500 entries simulating ServiceNow incidents table, detailing various attributes such as category, state, open and close dates, involved personnel, and incident specifics like location, description, and priority. It captures incident management activities with fields like 'opened_at', 'closed_at', 'assigned_to', 'short_description', and 'priority', reflecting the operational handling and urgency of issues across different locations and categories.",
        "header": "Hardware Incident Dataset (Flag 95)"
    },
    "insight_list": [
        {
            "data_type": "descriptive",
            "insight": "The incidents are equally distributed across all categories.",
            "insight_value": {
                "x_val": "All Categories",
                "y_val": 100
            },
            "plot": {
                "plot_type": "bar",
                "title": "Incidents Distribution by Category",
                "x_axis": {
                    "name": "Category",
                    "value": [
                        "Software",
                        "Network",
                        "Inquiry / Help",
                        "Hardware",
                        "Database"
                    ],
                    "description": "This represents the different categories of incidents."
                },
                "y_axis": {
                    "name": "Number of Incidents",
                    "value": [
                        100,
                        100,
                        100,
                        100,
                        100
                    ],
                    "description": "This represents the number of incidents in each category."
                },
                "description": "The bar chart displays the distribution of incidents across different categories. Each bar represents a category, and all categories have the same number of incidents, which is 100."
            },
            "question": "What is the distribution of incidents across all categories?",
            "actionable_insight": "Since the incidents are evenly distributed across all categories, it suggests a balanced workload. No specific category requires additional focus or resources based on this data.",
            "code": "plot = df.groupby(\"category\").size().plot(kind=\"barh\", color=sns.palettes.mpl_palette(\"Dark2\"))\n\nfig = plt.gcf()\n\n\nfor i in plot.patches:\n    # Get X and Y placement of label from rectangle\n    x_value = i.get_width()\n    y_value = i.get_y() + i.get_height() / 2\n\n    # Use X value as label and format number with one decimal place\n    label = \"{:.1f}\".format(x_value)\n\n    # Create annotation\n    plt.annotate(\n        label,                      \n        (x_value, y_value),         \n        xytext=(-10, 0),            \n        textcoords=\"offset points\", \n        ha='right',                 \n        va='center'                 \n    )\n\n# Set plot title\nplt.title('Incidents Distribution by Category')\n\n# Set x-axis label\nplt.xlabel('Category')\n\n# Set y-axis label\nplt.ylabel('Number of Incidents')\n\n# Display the figure\nplt.show()"
        },
        {
            "data_type": "diagnostic",
            "insight": "There are no specific issues mentioned in the incident descriptions for each category.",
            "insight_value": {
                "x_val": "No Specific Issues",
                "y_val": 100
            },
            "plot": {
                "plot_type": "word_cloud",
                "title": "Word Clouds for Problematic Sub-Categories within Each Category",
                "x_axis": {
                    "name": "Category",
                    "description": "This represents each category for which the word cloud is generated."
                },
                "y_axis": {
                    "name": "Frequency of Terms",
                    "description": "This represents the frequency of terms within the incident descriptions, visualized through the size of words in the word cloud."
                },
                "description": "The word clouds display the most frequent terms in incident descriptions for each category, highlighting specific sub-categories or types that are problematic. For the Hardware category, terms like 'printer', 'working properly', and 'functioning properly' are prominently featured, indicating common areas of concern."
            },
            "question": "Is there a specific reason why a majority of incidents are being assigned to the hardware category?",
            "actionable_insight": "There are no specific actions to be taken based on the word clouds, as there are no prominent issues or patterns identified within the incident descriptions. Further analysis or investigation may be required to understand the distribution of incidents across categories.",
            "code": "from wordcloud import WordCloud\n# Grouping the data by 'category' and concatenating 'short_description'\ngrouped_descriptions = df.groupby('category')['short_description'].apply(lambda x: ' '.join(x)).reset_index()\n\n# Setting up the plot with appropriate size\nplt.figure(figsize=(20, 10))\n\n# Generating a word cloud for each category\nfor index, row in grouped_descriptions.iterrows():\n    wordcloud = WordCloud(width=800, height=400, background_color='white').generate(row['short_description'])\n    \n    plt.subplot(3, 2, index+1)  # Adjust the grid size according to the number of categories\n    plt.imshow(wordcloud, interpolation='bilinear')\n    plt.title(row['category'])\n    plt.axis('off')\n\nplt.tight_layout()\nplt.show()"
        },
        {
            "data_type": "diagnostic",
            "insight": "There are specific mentions of printers in the incident descriptions.",
            "insight_value": {
                "x_val": "Printer",
                "y_val": 0
            },
            "plot": {
                "plot_type": "bar",
                "title": "Frequency of Printer in Incident Descriptions",
                "x_axis": {
                    "name": "Keyword",
                    "value": [
                        "Printer"
                    ],
                    "description": "This represents the keyword in incident descriptions."
                },
                "y_axis": {
                    "name": "Frequency",
                    "value": [
                        0
                    ],
                    "description": "This represents the frequency of the keyword 'Printer' in incident descriptions."
                },
                "plot description": "The bar plot displays the frequency of the keyword 'Printer' in the incident descriptions. The length of the bar corresponds to the frequency of the keyword. The 'Printer' keyword has a high frequency."
            },
            "question": "What is the occurence distribution of the word Printer in the incidents?",
            "actionable_insight": "The are no specific actions to recommend.",
            "code": "# Count the frequency of 'Printer' in 'short_description'\nprinter_incidents = df['short_description'].apply(lambda x: 'Printer' in x).sum()\n\n# Create a DataFrame for plotting\ndf_plot = pd.DataFrame({'Keyword': ['Printer'], 'Frequency': [printer_incidents]})\n\n# Plot the frequency\nplot = df_plot.plot(kind='bar', x='Keyword', y='Frequency', legend=False, color='blue')\n\n# Get the current figure for further manipulation\nfig = plt.gcf()\n\n# Loop through the rectangles (i.e., bars)\nfor i in plot.patches:\n    # Get X and Y placement of label from rectangle\n    x_value = i.get_x() + i.get_width() / 2\n    y_value = i.get_height()\n\n    # Use Y value as label and format number with one decimal place\n    label = \"{:.1f}\".format(y_value)\n\n    # Create annotation\n    plt.annotate(\n        label,                      # Use `label` as label\n        (x_value, y_value),         # Place label at end of the bar\n        xytext=(0, 5),              # Shift text slightly above bar\n        textcoords=\"offset points\", # Interpret `xytext` as offset in points\n        ha='center',                # Horizontally align label \n        va='bottom'                 # Vertically align label at bottom\n    )\n\n# Set plot title\nplt.title('Frequency of Printer in Incident Descriptions')\n\n# Set x-axis label\nplt.xlabel('Keyword')\n\n# Set y-axis label\nplt.ylabel('Frequency')\n\n# Display the figure\nplt.show()"
        },
        {
            "data_type": "descriptive",
            "insight": "There is no specific location where hardware incidents are concentrated.",
            "insight_value": {
                "x_val": "All Locations",
                "y_val": 100
            },
            "plot": {
                "plot_type": "bar",
                "title": "Incidents by Location",
                "x_axis": {
                    "name": "Location",
                    "value": [
                        "Australia",
                        "USA",
                        "UK",
                        "India",
                        "Canada"
                    ],
                    "description": "This represents the different locations of incidents."
                },
                "y_axis": {
                    "name": "Number of Incidents",
                    "value": [
                        22,
                        21,
                        20,
                        19,
                        18
                    ],
                    "description": "This represents the number of incidents in each location."
                },
                "plot description": "The bar plot displays the distribution of incidents across different locations. Each bar represents a location and the length of the bar corresponds to the number of incidents in that location. The 'Australia' location has the highest number of incidents."
            },
            "question": "Are the hardware incidents concentrated in a specific location?",
            "actionable_insight": "There are no specific actions to recommend based on the distribution of incidents across locations.",
            "code": "# Count the frequency of incidents in each location\nlocation_counts = df[df['category'] == 'Hardware']['location'].value_counts()\n\n# Create a DataFrame from the counts for plotting\ndf_plot = location_counts.reset_index()\ndf_plot.columns = ['Location', 'Number of Incidents']\n\n# Plot the frequency\nplot = df_plot.plot(kind='bar', x='Location', y='Number of Incidents', legend=False, color='blue')\n\n# Get the current figure for further manipulation\nfig = plt.gcf()\n\n# Loop through the rectangles (i.e., bars)\nfor i in plot.patches:\n    # Get X and Y placement of label from rectangle\n    x_value = i.get_x() + i.get_width() / 2\n    y_value = i.get_height()\n\n    # Use Y value as label and format number with one decimal place\n    label = \"{:.1f}\".format(y_value)\n\n    # Create annotation\n    plt.annotate(\n        label,                      # Use `label` as label\n        (x_value, y_value),         # Place label at end of the bar\n        xytext=(0, 5),              # Shift text slightly above bar\n        textcoords=\"offset points\", # Interpret `xytext` as offset in points\n        ha='center',                # Horizontally align label \n        va='bottom'                 # Vertically align label at bottom\n    )\n\n# Set plot title\nplt.title('Incident Location Distribution')\n\n# Set x-axis label\nplt.xlabel('Location')\n\n# Set y-axis label\nplt.ylabel('Number of Incidents')\n\n# Display the figure\nplt.show()"
        },
        {
            "data_type": "descriptive",
            "insight": "There is no significant increasing trend in the 'Hardware' or any other category.",
            "insight_value": {
                "x_val": "Time",
                "y_val": "Number of Incidents"
            },
            "plot": {
                "plot_type": "line",
                "title": "Incidents Over Time by Category",
                "x_axis": {
                    "name": "Time",
                    "value": "Time Series",
                    "description": "This represents the timeline of incidents."
                },
                "y_axis": {
                    "name": "Number of Incidents",
                    "value": "Incident Count",
                    "description": "This represents the number of incidents in each category over time."
                },
                "plot description": "The line plot displays the trend of incidents across different categories over time. Each line represents a category and the height of the line corresponds to the number of incidents in that category at a given time. The 'Hardware' category does not show any significant increasing trend. It is relatively stable and has a higher count than other categories."
            },
            "question": "Is there a pattern or trend over time in the distribution of incidents across categories?",
            "code": "# Convert 'opened_at' to datetime\ndf['opened_at'] = pd.to_datetime(df['opened_at'])\n\n# Resample the data by month and category, and count the number of incidents\ndf_resampled = df.groupby([pd.Grouper(key='opened_at', freq='M'), 'category']).size().unstack()\n\n# Plot the resampled data\nplot = df_resampled.plot(kind='line')\n\n# Set plot title\nplt.title('Incidents Over Time by Category')\n\n# Set x-axis label\nplt.xlabel('Time')\n\n# Set y-axis label\nplt.ylabel('Number of Incidents')\n\n# Display the figure\nplt.show()"
        },
        {
            "insight": "There is a significant variation in performance between Beth Anglin and Charlie Whitherspoon in terms of incident priority handling and resolution times.",
            "question": "How does the performance of Beth Anglin and Charlie Whitherspoon differ in handling incidents?",
            "code": "import csv\nfrom datetime import datetime\nimport matplotlib.pyplot as plt\nfrom collections import defaultdict\n\ndef parse_date(date_string):\n    return datetime.strptime(date_string, '%Y-%m-%d %H:%M:%S')\n\ndef calculate_resolution_time(opened_at, closed_at):\n    return (parse_date(closed_at) - parse_date(opened_at)).total_seconds() / 3600  # in hours\n\n# Initialize data structures\nstaff_data = {\n    'Beth Anglin': defaultdict(list),\n    'Charlie Whitherspoon': defaultdict(list)\n}\n\n# Read and process the data\nfilename = 'csvs/flag-95.csv'\nwith open(filename, 'r') as file:\n    reader = csv.DictReader(file)\n    for row in reader:\n        if row['assigned_to'] in staff_data:\n            priority = row['priority']\n            resolution_time = calculate_resolution_time(row['opened_at'], row['closed_at'])\n            staff_data[row['assigned_to']][priority].append(resolution_time)\n\n# Calculate average resolution times and incident counts\n# for staff, priorities in staff_data.items():\n#     print(f\"\\nPerformance metrics for {staff}:\")\n#     for priority, times in priorities.items():\n#         avg_time = sum(times) / len(times) if times else 0\n#         print(f\"  Priority {priority}: {len(times)} incidents, Avg resolution time: {avg_time:.2f} hours\")\n\n# Prepare data for plotting\nstaff_names = list(staff_data.keys())\npriorities = ['1 - Critical', '2 - High', '3 - Moderate', '4 - Low']\n\nfig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 12))\n\n# Plot incident counts\nincident_counts = [[len(staff_data[staff][priority]) for priority in priorities] for staff in staff_names]\nx = range(len(priorities))\nwidth = 0.35\nax1.bar([i - width/2 for i in x], incident_counts[0], width, label=staff_names[0])\nax1.bar([i + width/2 for i in x], incident_counts[1], width, label=staff_names[1])\nax1.set_ylabel('Number of Incidents')\nax1.set_title('Incident Count by Priority and Staff')\nax1.set_xticks(x)\nax1.set_xticklabels(priorities)\nax1.legend()\n\n# Plot average resolution times\navg_times = [[sum(staff_data[staff][priority])/len(staff_data[staff][priority]) if staff_data[staff][priority] else 0 \n              for priority in priorities] for staff in staff_names]\nax2.bar([i - width/2 for i in x], avg_times[0], width, label=staff_names[0])\nax2.bar([i + width/2 for i in x], avg_times[1], width, label=staff_names[1])\nax2.set_ylabel('Average Resolution Time (hours)')\nax2.set_title('Average Resolution Time by Priority and Staff')\nax2.set_xticks(x)\nax2.set_xticklabels(priorities)\nax2.legend()\n\nplt.tight_layout()\nplt.show()\nplt.close()"
        }
    ],
    "insights": [
        "The incidents are equally distributed across all categories.",
        "There are no specific issues mentioned in the incident descriptions for each category.",
        "There are specific mentions of printers in the incident descriptions.",
        "There is no specific location where hardware incidents are concentrated.",
        "There is no significant increasing trend in the 'Hardware' or any other category.",
        "There is a significant variation in performance between Beth Anglin and Charlie Whitherspoon in terms of incident priority handling and resolution times."
    ],
    "summary": "\n\n# Concise Summary of Key IT Incident Trends\n\n1. **Staff Performance Variation:** Beth Anglin handles more high-priority incidents with faster resolution times, while Charlie Whitherspoon deals with more low-priority incidents that take longer to resolve, suggesting differences in expertise or workload allocation.\n\n2. **Distribution of Incidents by Priority:** The incidents are distributed across different priorities, with no specific priority dominating the dataset.\n\n3. **Incident Category Distribution:** The most common incident categories are Hardware, Software, Network, and Database issues, with a notable number of Inquiry/Help incidents, reflecting the diverse nature of IT support needs."
}