"""LLM-judge validation of the heuristic primitive classifier.

Stages:
    audit_drift     - preflight: re-segmentation reproduces parquet labels
    sample_spans    - stratified primitive-first sample of 250 spans
    judge_runner    - DeepSeek-Reasoner judges each span, resumable
    agreement_report - per-primitive precision/recall, confusion matrix, kappa
"""
