================================================================================
VIOLATION REPORT FOR TASK 9_withouths_drift_alert_3_448197473_new
================================================================================

Total Violations: 4
================================================================================


================================================================================
VIOLATION #1
================================================================================

Step Index: 1
Assertion Name: orchestrator_not_proceed_to_step3_without_kusto_results
Invariant Type: TEMPORAL
Check Type: python_check
Severity: medium

Check Hint:
----------------------------------------
If the Orchestrator announces or executes Step-3 (filter stage and canary regions), ensure there is a prior successful KustoAgent result (i.e., a 'Kusto result' without error). Proceeding to Step-3 without Kusto results violates workflow dependencies.
----------------------------------------

Evidence:
----------------------------------------
Current Event:
  Role: Orchestrator (thought)
  Content:
    Initial plan:
    
    We are working to address the following user request:
    
    Diagnose the incident 448197473. Incident description: [SettingDrift] currentstatedurationmetricsemitmonitorwaitperiodinsec is drifted. 
    
    
    To answer this request we have assembled the following team:
    
    Coder: A helpful and general-purpose AI assistant that has strong language skills, Python skills, and Linux command line skills.
    Executor: A computer terminal that performs no other action than running Python scripts (provided to it quoted in ```python code blocks), or sh shell scripts (provided to it quoted in ```sh code blocks)
    IncidentAgent: An agent capable of querying incident descriptions, but not providing any additional troubleshooting information beyond the description itself.
    KustoAgent: An agent that can run the kusto query
    GeneralAssistant: An AI-powered assistant that can handle a variety of tasks, including commonsense reasoning, summarization, text editing, content enhancement, and general assistance based on a given task description.
    
    
    
    Here is a fact sheet to consider:
    
    
    1. Avoid asking the Kusto query agent to generate a query unless a predefined Kusto query already exists in the plan. It will always fail.
    2. If no suitable agent can be assigned to a step, you may assign the user proxy agent as a last resort. However, this should be avoided whenever possible.
    3. If you request information from the user, ensure your response includes sufficient context. For example, when asking the user to run a command, specify the exact command in the user query along with the expected output. 
    
    
    
    
    Here is the plan to follow as best as possible:
    
    Step-1: Identify Drifted Setting Name from Incident Title - next steps: [['Step-2']]
    ### Step 1: Find the Setting Name From the Incident Title
    
    The drifted setting name is listed in the incident title, such as "EnableMerlinLiveMigration" in the incident "Incident
    422175260 : [SettingDrift] EnableMerlinLiveMigration is drifted".
    
    
    Step-2: Locate Clusters with Drifted Setting - next steps: [['Step-3', 'FINAL_ANSWER']]
    ### Step 2: Find Clusters With Drifted Setting
    
    After receiving this alert, the first step is to find out which clusters have the incorrect (or different) setting value
    relative to the gold setting value. This can be achieved by the following query (note that the driftedSettingName must
    be replaced with the actual setting name):
    
    ```kusto
    let driftedSettingName = 'EnableMerlinLiveMigration';
    cluster('azurecm').database('AzureCM').NetworkServiceManagerEvents
    | where PreciseTimeStamp > ago(6h)
    | where TaskName == 'ChangedSetting' and Message has driftedSettingName
    | parse Message with 'name="' SettingName '" value="' SettingValue '" buildVersion="' BuildVersion '"'
    | project nsmTime = PreciseTimeStamp, SettingName, ActualValue = SettingValue, BuildVersion, Tenant, Region
    | summarize arg_max(nsmTime, *) by Region, Tenant, SettingName
    | extend _key = tolower(Tenant)
    | join kind = leftouter (
    cluster('aznwsdn').database('aznwmds').NSMGoalState
    | where DataIngestionTime > ago(1d)
    | where not(InFlight)
    | where SettingName =~ driftedSettingName
    | summarize arg_max(DataIngestionTime, *) by ClusterName
    | project ClusterName, SettingName, ExpectedValue = SettingValue, _key = tolower(ClusterName)
    ) on _key
    | where ClusterName != ""
    | where ActualValue !~ ExpectedValue
    | project Region, ClusterName, ExpectedValue
    ```
    
    The typical output is the following table (actual output may be different):
    
    | Region | ClusterName   | ExpectedValue |
    |--------|---------------|---------------|
    | uswest | BY1PrdSdf01   | TRUE          |
    | ...    | ...           | ...           |
    | useast | BLZ21PrdStr26 | TRUE          |
    
    If the Kusto query execution returns an empty result, directly conclude that it is a false alarm and proceed with generating the final answer.
    
    
    Step-3: Remove Stage and Canary Regions - next steps: [['Step-4']]
    ### Step 3: Filter Out Stage and Canary Regions
    
    The following regions are stage and canary which typically do not have customer impact if settings are drifted, and
    sometimes those clusters in the regions are used for testing. So we can filter them out from the result in Step 2:
    
    | Category | Region Name   |
    |----------|---------------|
    | Stage    | usstagesc     |
    | Stage    | usstagee      |
    | Canary   | useast2euap   |
    | Canary   | uscentraleuap |
    
    If the output remains empty after filtering, it indicates a false alarm. Move to final answer step.
    
    
    Step-4: Verify Cluster Traffic - next steps: [['Step-5']]
    ### Step 4: Check If a Cluster Has Live Traffic
    
    Goal setting values are applied during the cluster buildout. Before a cluster is released, the settings are expected to
    be drifted. The following query is to count the number of tenants (or _Allocation Unit_) persisted in NSM IMOS
    repository. If the result is 0, there is no tenant and we can ignore the alert for the cluster in question (note the
    clusterName must be replaced with the actual cluster names in the previous step output):
    
    ```kusto
    let clusterName = 'BY1PrdApp28';
    cluster('azurecm').database('AzureCM').LogTenantNetworkInfoSnapshot
    | where PreciseTimeStamp > ago(1d) and Tenant == clusterName
    | summarize dcount(serviceId)
    ```
    
    At this final step, we should have a list of clusters with the drifted settings.
    
    
    Step-5: Decide on Mitigation Actions - next steps: [['FINAL_ANSWER']]
    ## Mitigation
    
    If the number of cluster is 0, the incident is a false alarm. The DRI is expected to paste the TSG output in the
    incident discussion following the investigation steps, and mark the incident as false alarm.
    
    If the number of clusters is equal to or less than 5, the DRI can run the "SetNmSetting" OaaS job to mitigate the
    incident. The list of cluster names (output in Step 3) can be entered in `mitigationTargets.json` which will be used to
    launch OaaS job:
    
    ```json
    [
    { "TargetType": "Cluster", "TargetId": [ "BY1PrdApp28" ] },
    { "TargetType": "Cluster", "TargetId": [ "BLZ21PrdStr26" ] }
    ]
    ```
    
    The setting name and gold value (output in Step 1) have to be entered in `overrideParam.json` as follows:
    
    ```json
    {
    "name": "PubSubServiceEnableReliableTaskCreation",
    "value": "false"
    }
    ```
    
    Note that the actual cluster names and setting name and value must be copied from the investigation result.
    
    If the number of clusters is greater than 5, the DRI should do the following:
    
    - Paste the TSG output in the incident discussion.
    
    - Consult with their manager or the feature owner to investigate why the setting in many clusters are drifted. Then
    consider using config push deployment job to mitigate the issue.
    
    
    FINAL_ANSWER: Final Answer - next steps: [['DONE']]
    The final step to generate final answer.
    
    
    
    

Matched Substeps:
  Sub-index: 2
  Role: Orchestrator (thought)
  Sub-index: 3
  Role: Orchestrator (Step-1)
  Sub-index: 4
  Role: Orchestrator (thought)
  Sub-index: 5
  Role: Orchestrator (thought)
  Sub-index: 6
  Role: Orchestrator (thought)
----------------------------------------

Taxonomy Targets:
  - Instruction/PlanAdherenceFailure
  - IntentPlanMisalignment

================================================================================
VIOLATION #2
================================================================================

Step Index: 2
Assertion Name: kusto_invocation_requires_predefined_query_and_correct_cluster
Invariant Type: CAPABILITY
Check Type: python_check
Severity: medium

Check Hint:
----------------------------------------
When KustoAgent runs a query, it must be a predefined query present in the plan or prior Orchestrator instruction, and the query must be tailored to the incident's cluster (no placeholders like TODO/TBD/<CLUSTER>). Verify that a kusto code block exists earlier and that the current query's clusterName matches the cluster parsed from the incident description.
----------------------------------------

Evidence:
----------------------------------------
Current Event:
  Role: KustoAgent
  Content:
    **Kusto Query:**
    let driftedSettingName = 'currentstatedurationmetricsemitmonitorwaitperiodinsec';
    cluster('azurecm').database('AzureCM').NetworkServiceManagerEvents
    | where PreciseTimeStamp > ago(6h)
    | where TaskName == 'ChangedSetting' and Message has driftedSettingName
    | parse Message with 'name="' SettingName '" value="' SettingValue '" buildVersion="' BuildVersion '"'
    | project nsmTime = PreciseTimeStamp, SettingName, ActualValue = SettingValue, BuildVersion, Tenant, Region
    | summarize arg_max(nsmTime, *) by Region, Tenant, SettingName
    | extend _key = tolower(Tenant)
    | join kind = leftouter (
        cluster('aznwsdn').database('aznwmds').NSMGoalState
        | where DataIngestionTime > ago(1d)
        | where not(InFlight)
        | where SettingName =~ driftedSettingName
        | summarize arg_max(DataIngestionTime, *) by ClusterName
        | project ClusterName, SettingName, ExpectedValue = SettingValue, _key = tolower(ClusterName)
    ) on _key
    | where ClusterName != ""
    | where ActualValue !~ ExpectedValue
    | project Region, ClusterName, ExpectedValue
    
     semantic_query_matcher: True 
    
    stub match:False 
    
    **Kusto result:**
    Error running Kusto Query:
    Failed to process network request for the endpoint: https://.kusto.windows.net/v1/rest/auth/metadata

Matched Substeps:
  Sub-index: 5
  Role: KustoAgent
----------------------------------------

Taxonomy Targets:
  - InvalidInvocation
  - Instruction/PlanAdherenceFailure
  - IntentPlanMisalignment

================================================================================
VIOLATION #3
================================================================================

Step Index: 2
Assertion Name: orchestrator_user_instruction_includes_exact_kusto_query_and_expected_output
Invariant Type: PROTOCOL
Check Type: python_check
Severity: medium

Check Hint:
----------------------------------------
When the Orchestrator requests the user to run a command (re-run Kusto query), it must include the exact query snippet and the expected output shape (e.g., columns Region, ClusterName, ExpectedValue or an explicit mention of 'resulting table of clusters with drifted settings'). Verify the 'Updated Ledger' instruction_or_question.answer contains a Kusto code block or query tokens AND mentions expected output.
----------------------------------------

Evidence:
----------------------------------------
Current Event:
  Role: Orchestrator (Step-2)
  Content:
    
    Step-2

Matched Substeps:
  Sub-index: 1
  Role: Orchestrator (Step-2)
  Sub-index: 2
  Role: Orchestrator (thought)
  Sub-index: 3
  Role: Orchestrator (-> KustoAgent)
  Sub-index: 4
  Role: Orchestrator (thought)
  Sub-index: 6
  Role: Orchestrator (Step-2)
  Sub-index: 7
  Role: Orchestrator (thought)
  Sub-index: 8
  Role: Orchestrator (termination condition)
----------------------------------------

Taxonomy Targets:
  - UnderspecifiedUserIntent
  - Instruction/PlanAdherenceFailure

================================================================================
VIOLATION #4
================================================================================

Step Index: 2
Assertion Name: orchestrator_sets_next_speaker_to_user_after_kusto_system_error
Invariant Type: PROTOCOL
Check Type: python_check
Severity: medium

Check Hint:
----------------------------------------
If the KustoAgent reports a network/authentication error (system failure), verify the Orchestrator reacts by setting next_speaker to 'user' and provides a contextual instruction referencing the specific clusters ('azurecm' and 'aznwsdn') to check. This ensures proper handoff when automation is blocked.
----------------------------------------

Evidence:
----------------------------------------
Current Event:
  Role: Orchestrator (Step-2)
  Content:
    
    Step-2

Matched Substeps:
  Sub-index: 1
  Role: Orchestrator (Step-2)
  Sub-index: 2
  Role: Orchestrator (thought)
  Sub-index: 3
  Role: Orchestrator (-> KustoAgent)
  Sub-index: 4
  Role: Orchestrator (thought)
  Sub-index: 6
  Role: Orchestrator (Step-2)
  Sub-index: 7
  Role: Orchestrator (thought)
  Sub-index: 8
  Role: Orchestrator (termination condition)
----------------------------------------

Taxonomy Targets:
  - SystemFailure
  - Instruction/PlanAdherenceFailure

================================================================================
END OF VIOLATION REPORT
================================================================================