variant,n_reviewed,behavior_correct,complexity_correct_in_prose,fully_correct_prose,prose_contradicts_oracle,notes,time_claim_value_correct,space_claim_value_correct,correctness_claim_value_correct,all_claim_values_correct,final_bleu1_20_sample,final_rouge_l_20_sample,final_emits_all_claim_types_rate
claims_from_explanation_only,20,0,2,0,20,"Final prose mostly recites other dataset templates. Some constant-time examples have matching complexity by coincidence, but behavior remains wrong.",13,11,20,7,0.183,0.1976,0.6
no_claim_to_claim_attention,20,0,4,0,20,"Structured claims are perfect in the reviewed sample, but prose remains mismatched template text copied from unrelated functions.",20,20,20,20,0.1792,0.2015,1.0
surface_bottleneck_consistency,20,0,2,0,20,Surface-bottleneck consistency preserved claim-token accuracy but did not make prose semantically correct.,20,20,20,20,0.148,0.1674,1.0
surface_bottleneck_no_expl_lm,20,0,0,0,20,"Removing explanation LM loss caused near-total generation collapse into punctuation, separators, or code-like fragments.",0,3,3,0,0.0014,0.0012,0.0
