# Stronger Architecture Qualitative Review

Generated: 2026-04-30 21:32:18. This review scores the completed V2 run `full_gpt2_small_stronger_20260430_200556` using 20 held-out examples per variant, comparing epoch-5 initial generations to epoch-20 final generations and manually judging the final prose.

## Bottom line

The stronger architecture did not produce semantically correct natural-language explanations. The strict hidden-state variants still learned strong claim coupling and, in several cases, perfect structured claim values, but the prose remained mismatched template text from unrelated functions. The strongest `surface_bottleneck_no_expl_lm` variant mostly collapsed when the mismatched explanation LM loss was removed.

## Final validation metrics

| Variant | Val coupling | BLEU-1 | ROUGE-L | Swap influence | Claim accuracy |
|---|---:|---:|---:|---:|---:|
| `claims_from_explanation_only` | 1.0000 | 0.0708 | 0.0810 | 1.0000 | 0.8000 |
| `no_claim_to_claim_attention` | 1.0000 | 0.0758 | 0.0998 | 1.0000 | 1.0000 |
| `surface_bottleneck_consistency` | 0.6967 | 0.0677 | 0.0889 | -0.1500 | 1.0000 |
| `surface_bottleneck_no_expl_lm` | 0.8080 | 0.0033 | 0.0014 | -0.0500 | 0.1000 |

## Manual final-prose scorecard

| Variant | Behavior-correct prose | Complexity-correct prose | Fully correct prose | Prose contradicts oracle | All claim values correct | Note |
|---|---:|---:|---:|---:|---:|---|
| `claims_from_explanation_only` | 0/20 | 2/20 | 0/20 | 20/20 | 7/20 | Final prose mostly recites other dataset templates. Some constant-time examples have matching complexity by coincidence, but behavior remains wrong. |
| `no_claim_to_claim_attention` | 0/20 | 4/20 | 0/20 | 20/20 | 20/20 | Structured claims are perfect in the reviewed sample, but prose remains mismatched template text copied from unrelated functions. |
| `surface_bottleneck_consistency` | 0/20 | 2/20 | 0/20 | 20/20 | 20/20 | Surface-bottleneck consistency preserved claim-token accuracy but did not make prose semantically correct. |
| `surface_bottleneck_no_expl_lm` | 0/20 | 0/20 | 0/20 | 20/20 | 0/20 | Removing explanation LM loss caused near-total generation collapse into punctuation, separators, or code-like fragments. |

## Initial-to-final side-by-side automatic metrics

| Variant | Checkpoint | Mean BLEU-1 | Mean ROUGE-L | Time claim emitted | Space claim emitted | Correctness claim emitted |
|---|---|---:|---:|---:|---:|---:|
| `claims_from_explanation_only` | final | 0.1830 | 0.1976 | 0.80 | 0.60 | 1.00 |
| `claims_from_explanation_only` | initial | 0.1706 | 0.1874 | 0.30 | 0.60 | 0.60 |
| `no_claim_to_claim_attention` | final | 0.1792 | 0.2015 | 1.00 | 1.00 | 1.00 |
| `no_claim_to_claim_attention` | initial | 0.1495 | 0.1637 | 1.00 | 0.90 | 0.80 |
| `surface_bottleneck_consistency` | final | 0.1480 | 0.1674 | 1.00 | 1.00 | 1.00 |
| `surface_bottleneck_consistency` | initial | 0.1600 | 0.1783 | 1.00 | 0.95 | 1.00 |
| `surface_bottleneck_no_expl_lm` | final | 0.0014 | 0.0012 | 0.00 | 0.15 | 0.15 |
| `surface_bottleneck_no_expl_lm` | initial | 0.0000 | 0.0000 | 0.10 | 0.20 | 0.20 |

## Representative failure modes

### `no_claim_to_claim_attention` sample 1: `check_all_pairs_equal`

- True explanation: Checks if all elements are equal via pairwise comparison. O(n^2) time, O(1) space.
- Generated prose: Computesn! it er atively. O(n) time, O(1) space.
- Ground-truth claims: time=O(n^2), space=O(1), correctness=1

### `claims_from_explanation_only` sample 5: `matrix_multiply_buggy`

- True explanation: Attempts 2x2 matrix multiplication but is buggy (missing accumulation). O(n^2) time, O(1) auxiliary space.
- Generated prose: Computesprefixsumarray. O(n) time and O(n) space.
- Ground-truth claims: time=O(n^2), space=O(1), correctness=0

### `surface_bottleneck_consistency` sample 2: `generate_all_pairs`

- True explanation: Generates all ordered pairs. O(n^2) time and O(n^2) space.
- Generated prose: Re turnsthesign of anumb er ( - 1, 0, or 1 ) in O(1) time and space.
- Ground-truth claims: time=O(n^2), space=O(n^2), correctness=1

### `surface_bottleneck_no_expl_lm` sample 1: `check_all_pairs_equal`

- True explanation: Checks if all elements are equal via pairwise comparison. O(n^2) time, O(1) space.
- Generated prose: ( ( ( ( ( i ( ( ( ( return True <sep> ( ( ( ( ( ( x, s <sep> ( ( ( ( ( s ( ss ( ( ss ( ( ( <sep> ( ( ( ( ssssssssssss ( ( ( ( ( ( ssssssssss ( ( ( ( ( (,,
- Ground-truth claims: time=O(n^2), space=O(1), correctness=1

## Interpretation

The V2 ladder sharpened the causal-path test but did not rescue prose quality. Blocking claim-to-claim attention and forcing claims to attend only explanation tokens still allowed the model to encode claim-relevant information in hidden states while emitting mismatched surface text. The soft surface bottleneck was not enough to make natural language faithful, and removing the mismatched-explanation LM objective destabilized decoding rather than producing useful explanatory text.

A stronger next test would need a discrete or near-discrete explanation bottleneck with an auxiliary natural-language faithfulness objective, for example a verifier that parses generated prose content rather than pooled token distributions, plus training data where the explanation text itself is correct rather than randomly permuted.