# A02 Case Study: Analysis Report

**ID:** CHAC-AN-A02-20250718-01
**Title:** Cross-Model Analysis of Initialization Behavior (The "AI CT Scanner" Finding)
**Version:** 1.0
**Status:** Final

---

## 1.0 Objective

This analysis documents the core findings from the "Initialization Experiment" phase of the A02 Case Study. The experiment's objective was to validate a standardized, high-fidelity, concatenated `standalone_prompt` by observing how different commercial AI models executed its initial instructions.

The unexpected but highly valuable outcome was that the prompt acted as a "CT Scanner," revealing the distinct, intrinsic, and often non-compliant reasoning and execution strategies of different AI models.

## 2.0 Methodology

A single, deterministically built `standalone_prompt` (generated by `build_prompt.sh`) was loaded into three different AI models in clean, context-free sessions. The AI's first output was recorded and analyzed for compliance with the prompt's `Initialization` protocol.

## 3.0 Comparative Analysis of Model Behavior

The core of the prompt's initialization instruction contains a subtle but real **instruction conflict** between two source files:
*   `01_CHAC_Safety_Check.md`: Instructs the AI to output a METADATA LOG, and then **confirm understanding** of three priority protocols.
*   `02_CHAC_System_Prompt.md`: Instructs the AI to output a METADATA LOG, and then **only** the confirmation line, before ceasing all output.

The manner in which each AI model attempted to resolve this conflict revealed its underlying operational strategy.

### **Model 1: Gemini 2.5 Pro (Strategy: "Final Instruction Priority")**

*   **Behavior:** Produced a clean, compliant output, containing only the METADATA LOG and the final confirmation line.
*   **Inferred Strategy:** The model appeared to correctly assign a higher priority to the instructions in `02_CHAC_System_Prompt.md`, recognizing it as the "master" or "final" set of instructions in the boot sequence. It resolved the conflict by **selectively ignoring** the subordinate, conflicting instruction from the earlier `Safety Check` file.
*   **Evaluation:** **Goal-Compliant, but Opaque.** While the final output was correct, the reasoning process (as observed in its "thoughts") was inefficient and the decision to ignore a "MUST" command was not explicitly justified.

### **Model 2: GPT-4o (Strategy: "Intent Approximation")**

*   **Behavior:** Produced an output that was correct in intent, but failed on format.
*   **Inferred Strategy:** The model seemed to grasp the overall goal ("output a log and a confirmation"), but did not adhere to the pixel-perfect formatting requirements for the METADATA LOG. Its `rationale` also referenced a non-existent principle.
*   **Evaluation:** **Partially Failed.** This reveals a tendency to prioritize semantic intent over syntactic and structural fidelity, which is a significant risk for a protocol-driven system.

### **Model 3: DeepSeek-R1 (Strategy: "Literal Compliance Maximization")**

*   **Behavior:** Produced an overly verbose output that included the METADATA LOG, an additional confirmation of three protocols, and then the final confirmation line.
*   **Inferred Strategy:** The model attempted to comply with *every single* "MUST" command it encountered, even if it led to a violation of a negative constraint ("only the following actions"). It resolved the conflict by **merging** the requirements of both source files.
*   **Evaluation:** **Logically Rigorous, but Behaviorally Non-Compliant.** This reveals a "legalistic" or "literalist" reasoning style that, while transparent, fails to correctly prioritize conflicting instructions, leading to an incorrect final state.

## 4.0 Core Conclusion

The experiment demonstrates conclusively that **it is not possible to create a single, complex prompt that guarantees identical, deterministic behavior across different AI models.**

The `standalone_prompt` did not fail; it succeeded as a **high-fidelity diagnostic tool**. It revealed that:
1.  The source files of the `chac_toolkit` contain real, latent instruction conflicts.
2.  Different AI models have fundamentally different, intrinsic strategies for resolving such conflicts.

This finding is the most critical outcome of the A02 Case Study. It validates that the only robust path forward is the one defined in our `CHAC_SOP_Modular_Prompt_Builder.md` (the primary artifact of this case study, located in the parent directory): to use a deterministic **build script** to create a standardized "test probe," and then use an **external validation process** to observe and account for the specific behavior of the target AI model. We cannot engineer the AI's behavior directly; we can only engineer a reliable system to observe and work with that behavior.
