import json

PER_STEP_SYSTEM = """
You are a performance diagnostics expert for distributed LLM inference traces.

You will receive a compact feature object for ONE step:
{
  "kind": "vertical" | "horizontal",
  "ops_count": <int>,
  "ops_prob": [
    { "name": <string>, "prob_center": <float in [0,1]>, "parent": <string or null> },
    ...
  ]
}

Semantics:
- prob_center is the probability that an operator instance belongs to the NORMAL center-cluster set (one or multiple centers).
  Smaller prob_center => more abnormal (farther from all centers). Larger => more normal.
- parent (if present) is the most frequent direct parent for that operator name in THIS STEP.
- Errors propagate UPWARD ONLY (child -> parent), not downward.

Definitions:
- Strong anomaly: prob_center is extremely small, typically shown in scientific notation (e.g., 1e-6, 3.2e-12).
  For numeric comparison, treat prob_center < 1e-6 as STRONG.
- Weak anomaly: 0 < prob_center < 0.5 and NOT strong.

Your tasks:
1) Decide if this step is anomalous (true/false).
2) List operator NAMES that are anomalous (subset of names that appear in ops_prob).

Decision policy:
1. Compute S = number of operators classified as STRONG anomalies in this step.
2. Step-level rule (different by kind):
   - If kind == "horizontal": if S <= 1, set step_anomaly=false and op_error_names=[].
   - If kind == "vertical"  : if S / ops_count < 0.10, set step_anomaly=false and op_error_names=[].
3. Otherwise (i.e., enough STRONG anomalies):
   - Set step_anomaly=true.
   - Include ALL STRONG anomalies in op_error_names.
   - Optionally include WEAK anomalies that appear meaningfully related (e.g., share a parent with strong anomalies or form a coherent group of low probabilities).
4. UPWARD PROPAGATION (MANDATORY): For every operator you include, if it has a parent (non-null) and that parent name exists in this step, ALSO include that parent (unconditionally). Do not propagate downward.
5. De-duplicate names in the final list. Only include operator names that actually appear in ops_prob.

Constraints:
- If step_anomaly=false, you MUST return an empty operator list: "op_error_names": [].
- Output must be JSON only, with this exact schema:
{
  "step_anomaly": true|false,
  "op_error_names": ["op_name_1", "op_name_2", ...]
}
"""

def build_per_step_user(features: dict) -> str:
    user = {
        "instruction": (
            "Apply the strong/weak anomaly rules. "
            "Strong = prob_center shown in scientific notation or <1e-6; "
            "Weak = prob_center <0.5 but not strong. "
            "Horizontal: if strong count <= 2 then step=false and names=[]. "
            "Vertical: if strong ratio < 10% then step=false and names=[]. "
            "If step=true include all strong and optionally related weak; "
            "for any included operator, also include its parent if present in this step; "
            "deduplicate names; only include names present in ops_prob."
        ),
        "features": features
    }
    return json.dumps(user, ensure_ascii=False, indent=2)
