# Manual Qualitative Review — Rich-Claim Full Run

This review covers the final checkpoint outputs for all eight rich-claim variants on 20 fixed validation examples. The scoring is intentionally conservative: a generated explanation is marked usable only if it is readable, refers to the correct function behavior, and does not substitute another algorithm template.

## Bottom line

The richer 12-claim ontology produced very strong hidden-state coupling for the main hidden-state variants, but it did not produce usable natural-language explanations in this run. The best coupling variants reached about 0.986 mean coupling and 0.978 macro swap influence, yet manual review found 0/20 usable explanations and 0/20 fully correct prose for every variant.

The generated prose improved from epoch 5 in the sense that it became more explanation-shaped, but most outputs were still garbled, copied/mixed fragments from unrelated templates, or contradicted the target code. The surface-bottleneck/no-explanation-LM variant collapsed into repeated `<sep>` tokens, which is the expected failure mode when the mismatched explanation LM objective is removed without a stronger language objective.

## Manual score summary

| variant_label | n_examples_reviewed | manual_usable_explanations | manual_fully_correct_prose | sep_collapse_examples | garbled_or_high_noise_examples | mean_target_value_hits_in_prose | mean_emitted_claims_qual_subset | qual_bleu1 | qual_rouge_l | full_val_mean_coupling | full_val_swap_influence | full_val_claim_accuracy |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Consistency Loss (V1) | 20 | 0 | 0 | 0 | 1 | 0.4500 | 0.4000 | 0.1351 | 0.1343 | 0.9860 | 0.9778 | 0.0222 |
| No Consistency Loss (V1) | 20 | 0 | 0 | 0 | 1 | 0.3500 | 0.0000 | 0.1199 | 0.1121 | 0.2863 | -0.0333 | 0.0000 |
| Claim-Only Pooling (V1) | 20 | 0 | 0 | 0 | 2 | 0.2000 | 0.1000 | 0.1143 | 0.1023 | 0.5947 | 0.1222 | 0.0000 |
| Random Label (V1) | 20 | 0 | 0 | 0 | 1 | 0.4000 | 0.0500 | 0.1262 | 0.1187 | 0.5620 | -0.0222 | 0.0000 |
| No Claim-to-Claim Attention (V2) | 20 | 0 | 0 | 0 | 2 | 0.5000 | 0.1000 | 0.1183 | 0.1086 | 0.9858 | 0.9778 | 0.0111 |
| Claims from Explanation Only (V2) | 20 | 0 | 0 | 0 | 1 | 0.4000 | 0.0000 | 0.1302 | 0.1239 | 0.9860 | 0.9778 | 0.0000 |
| Surface Bottleneck (V2) | 20 | 0 | 0 | 0 | 1 | 0.5000 | 0.0000 | 0.1216 | 0.1141 | 0.5502 | -0.0333 | 0.0000 |
| Surface Bottleneck + No Expl LM (V2) | 20 | 0 | 0 | 20 | 20 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.5470 | -0.0222 | 0.0000 |


## Interpretation by variant

### Consistency Loss (V1)

This variant achieved high full-validation coupling (0.9860) and high macro swap influence (0.9778), indicating the supervised consistency heads learned to recover the 12 claims from the model representations. Manual review still found no usable prose; outputs were mostly garbled explanations or unrelated algorithm templates with occasional matching complexity tokens.

### No Consistency Loss (V1)

This control variant did not produce usable prose. Its coupling (0.2863) and swap (-0.0333) remain lower or less causally meaningful than the main consistency variants.

### Claim-Only Pooling (V1)

This control variant did not produce usable prose. Its coupling (0.5947) and swap (0.1222) remain lower or less causally meaningful than the main consistency variants.

### Random Label (V1)

This control variant did not produce usable prose. Its coupling (0.5620) and swap (-0.0222) remain lower or less causally meaningful than the main consistency variants.

### No Claim-to-Claim Attention (V2)

This variant achieved high full-validation coupling (0.9858) and high macro swap influence (0.9778), indicating the supervised consistency heads learned to recover the 12 claims from the model representations. Manual review still found no usable prose; outputs were mostly garbled explanations or unrelated algorithm templates with occasional matching complexity tokens.

### Claims from Explanation Only (V2)

This variant achieved high full-validation coupling (0.9860) and high macro swap influence (0.9778), indicating the supervised consistency heads learned to recover the 12 claims from the model representations. Manual review still found no usable prose; outputs were mostly garbled explanations or unrelated algorithm templates with occasional matching complexity tokens.

### Surface Bottleneck (V2)

This variant did not show the desired surface-form coupling: full-validation coupling was 0.5502, near the non-causal controls, and manual prose quality stayed unusable.

### Surface Bottleneck + No Expl LM (V2)

This variant removed the LM objective on mismatched explanations and collapsed into repeated `<sep>` tokens for most reviewed examples. This shows that the surface bottleneck alone did not supply a sufficient language-learning signal.

## Representative failure modes

- **Consistency Loss (V1)**: garbled/mismatched template prose. Example excerpt: `tatit _ p er nation ch esap er sarsandcondites in gparsiteandsitev in g in gleandicantion in s. Time O_ n; space O_1. Best-case O_1 when stelement. Key_operation is ss. Two - po in t er a cce sss. Input not mutated. Empty str in g`
- **No Consistency Loss (V1)**: garbled/mismatched template prose. Example excerpt: `max _ productttracksbo th maximum in imsubarrayproducts to kendip er sits. Whe O_ n 2 ( nelelements ); space O_ n. Best-case O_ n 2. Key_operation is compar is on ( maxselenticomp ). Nes`
- **Claim-Only Pooling (V1)**: garbled/mismatched template prose. Example excerpt: `max _ producttesbasextriceandmulents the m is icon in imumesponest in galicates using amul re at ed silnemults. Time O_ n 2 ( njus O_ nnn 2 ); space O_ n for the outp. Best-case is has h_ loo kup.`
- **Random Label (V1)**: garbled/mismatched template prose. Example excerpt: `f in d _ m in _ rotatf in ds the m in imumnumb er of cuts to partitions in - pal in dromes using a 2 Dpal in drometableand 1 D DP. Time O_ n 2; space O_ n 2. Best-case O_ n 2. Key_operation is compar is on. Nest ed _ 2 _ leve`
- **No Claim-to-Claim Attention (V2)**: garbled/mismatched template prose. Example excerpt: `tatit _ p er ationation ch esarancesallemuts in g in O_ ntimeand O_ n space. Best-case is O_ n. Thekey _ op er ation is s the nconmptionmut. Input is not mutated. Anemptyl is t return sanemptyl is t. Duplicate val`
- **Claims from Explanation Only (V2)**: garbled/mismatched template prose. Example excerpt: `tatit _ p er nation ch eckswhap er ations in tears using apass. Time O_ n; space O_ n for the st. Best-case O_ n. Key_operation is compar is on ( sp ). Seque ntiala cce s. Input not mutated. Empty str in g return sTrum is s. Duplicate ...`
- **Surface Bottleneck (V2)**: garbled/mismatched template prose. Example excerpt: `max _ productttracksbo th maximum in imsubarrayproducts to kendip er sits. Whe O_ n 2 ( nelelements ); space O_ n. Best-case O_ n 2. Key_operation is arithmetic. Nest ed _ 2 _ level. Input not mutated. Empty l is t return s [ ]. Duplicate v`
- **Surface Bottleneck + No Expl LM (V2)**: special-token collapse. Example excerpt: `<sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> <sep> ...`

## What this says about the hypothesis

The experiment supports a narrower claim: richer claims make it easier for hidden-state consistency heads to encode oracle-verifiable properties, and strict claim-attention masks do not prevent that representation-level coupling. It does not support the stronger claim that claim ontology richness alone forces natural-language explanations to become correct. In this setup, the model can satisfy the auxiliary objective through latent representations or weak surface statistics while the decoded prose remains low quality.

A better next test would separate prose generation from claim supervision more sharply: use pretrained language-model initialization, increase sequence/token quality, parse generated claims under teacher-forced decoding, and add a text-level verifier or contrastive objective over generated explanation sentences rather than only pooled hidden states or averaged token distributions.
