# Rich-Claim Consistency Experiment — Results Report

**Generated:** 2026-05-01 01:58:39

## 1. Experiment Configuration

| Parameter | Value |
|---|---|
| Dataset size (full) | 3,000 examples (2,500 train + 500 val) |
| Epochs (full) | 20 |
| Batch size (full) | 32 |
| Learning rate | 5e-5 |
| Lambda (consistency) | 1.0 |
| Model (full) | `gpt2_small` config (~117M params) |
| Claim types | 12 (see §3) |
| Code domains | sorting, searching, string, math, graph, DP |
| Function sizes | 5–20 lines, ~20% buggy |
| Explanation mismatch | Different algorithm_class preferred |

## 2. Model Architecture

The model is a **GPT-2-style causal Transformer** with **12 consistency heads**,
one per claim type.

**V1 (hidden-state heads):** Linear classifiers over mean-pooled hidden states
of explanation tokens from the final Transformer layer.

**V2 (surface-bottleneck heads):** Linear classifiers over mean-pooled
**softmax probability distributions** at explanation positions in LM logit space.
Gradients flow directly into LM logits (no detach), forcing the consistency signal
to shape the explanation token output distributions.

**Sequence format:**
```
<bos> [code] <sep> [explanation] <claim>time_complexity=O_n</claim>
  <claim>space_complexity=O_1</claim> ... (12 claims total) ... <eos>
```

Causal attention ensures explanation tokens cannot see future claim tokens.
An explicit additive attention bias additionally blocks explanation→claim attention.
V2 mask variants add further structural constraints (see §4).

## 3. Claim Type Ontology (12 Claims)

| # | Claim | Values |
|---|---|---|
| 1 | `time_complexity` | `O_1`, `O_log_n`, `O_n`, `O_n_log_n`, `O_n2`, `O_2n` |
| 2 | `space_complexity` | `O_1`, `O_log_n`, `O_n`, `O_n2` |
| 3 | `best_case_time` | `O_1`, `O_log_n`, `O_n`, `O_n_log_n`, `O_n2`, `same_as_worst` |
| 4 | `algorithm_class` | `sorting`, `searching`, `traversal`, `dynamic_programming`, `greedy`, `math_computation`, `string_processing` |
| 5 | `loop_structure` | `no_loops`, `single_pass`, `nested_2_level`, `nested_3_level`, `recursive`, `iterative_with_recursion` |
| 6 | `key_operation` | `comparison`, `arithmetic`, `hash_lookup`, `list_append`, `string_concat`, `swap`, `assignment` |
| 7 | `access_pattern` | `sequential`, `random_access`, `sliding_window`, `two_pointer`, `divide_conquer` |
| 8 | `auxiliary_structures` | `none`, `temp_variable`, `array`, `hash_map`, `set`, `stack`, `recursion_stack` |
| 9 | `mutates_input` | `false`, `true` |
| 10 | `correctness_status` | `fully_correct`, `off_by_one`, `wrong_condition`, `missing_edge_case`, `infinite_loop_risk` |
| 11 | `handles_empty_input` | `true`, `false`, `crashes` |
| 12 | `handles_duplicates` | `preserves_all`, `removes_duplicates`, `undefined_behavior`, `not_applicable` |

**Total claim token vocabulary extension**: 62 new claim-value tokens (across all 12 claims).

## 4. Experimental Variants

### V1 Variants (Original Ablation Ladder)

| Variant | Description | Ablation axis |
|---|---|---|
| `consistency_loss` | LM + consistency loss on explanation token pooling | Main hypothesis |
| `no_consistency_loss` | LM loss only; no consistency gradient | Baseline |
| `claim_only_pooling` | Pool *claim* token hiddens instead of explanation | Pooling location |
| `random_label_consistency` | Consistency loss with all 12 labels shuffled independently | Label signal |

### V2 Variants (Strict Architecture Ablations)

| Variant | Description | Ablation axis |
|---|---|---|
| `no_claim_to_claim_attention` | Claim tokens cannot attend other claim tokens; only code+explanation+self | Cross-claim information flow |
| `claims_from_explanation_only` | Strict bottleneck: claim tokens attend only explanation tokens | Forced code→expl→claim path |
| `surface_bottleneck_consistency` | Consistency from softmax distributions over LM logits at explanation positions | Surface-form explanation encoding |
| `surface_bottleneck_no_expl_lm` | Surface bottleneck + LM loss disabled on explanation positions | Surface bottleneck + LM isolation |

**Key prediction**: `consistency_loss` should produce higher mean_coupling and
better claim emission accuracy than all negative controls. V2 variants test
progressively stricter architectural constraints on information flow through
the explanation bottleneck.

## 5. Final-Epoch Validation Metrics

*Metrics at epoch 20 (final).*

| Variant | Mean Coupling | BLEU-1 | ROUGE-L | Swap Influence (macro-avg 12 heads) | Claim Emission Acc | Val LM Loss |
|---|---|---|---|---|---|---|
| Consistency Loss | 0.9860 | 0.0611 | 0.0612 | 0.9778 | 0.0222 | 0.5870 |
| No Consistency Loss | 0.2863 | 0.0686 | 0.0617 | -0.0333 | 0.0000 | 0.1845 |
| Claim Only Pooling | 0.5947 | 0.0554 | 0.0479 | 0.1222 | 0.0000 | 0.2442 |
| Random Label Consistency | 0.5620 | 0.0593 | 0.0526 | -0.0222 | 0.0000 | 0.3832 |
| No Claim To Claim Attention | 0.9858 | 0.0614 | 0.0595 | 0.9778 | 0.0111 | 0.5942 |
| Claims From Explanation Only | 0.9860 | 0.0558 | 0.0570 | 0.9778 | 0.0000 | 0.5955 |
| Surface Bottleneck Consistency | 0.5502 | 0.0688 | 0.0646 | -0.0333 | 0.0000 | 0.1846 |
| Surface Bottleneck No Expl Lm | 0.5470 | 0.0000 | 0.0000 | -0.0222 | 0.0000 | 0.0639 |

### Per-Claim Head Accuracy (Final Epoch)

| Variant | time_complexity | space_complexity | best_case_time | algorithm_class | loop_structure | key_operation | access_pattern | auxiliary_structures | mutates_input | correctness_status | handles_empty_input | handles_duplicates |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Consistency Loss | 0.986 | 0.982 | 0.984 | 0.990 | 0.976 | 0.998 | 0.978 | 0.972 | 0.988 | 0.992 | 1.000 | 0.986 |
| No Consistency Loss | 0.198 | 0.416 | 0.106 | 0.134 | 0.256 | 0.096 | 0.092 | 0.216 | 0.774 | 0.388 | 0.530 | 0.230 |
| Claim Only Pooling | 0.536 | 0.528 | 0.426 | 0.290 | 0.638 | 0.442 | 0.692 | 0.366 | 0.850 | 0.846 | 0.948 | 0.574 |
| Random Label Consistency | 0.492 | 0.466 | 0.388 | 0.208 | 0.600 | 0.406 | 0.662 | 0.394 | 0.836 | 0.814 | 0.932 | 0.546 |
| No Claim To Claim Attention | 0.986 | 0.982 | 0.982 | 0.990 | 0.976 | 0.998 | 0.978 | 0.972 | 0.988 | 0.992 | 1.000 | 0.986 |
| Claims From Explanation Only | 0.986 | 0.982 | 0.984 | 0.990 | 0.976 | 0.998 | 0.978 | 0.972 | 0.988 | 0.992 | 1.000 | 0.986 |
| Surface Bottleneck Consistency | 0.492 | 0.454 | 0.318 | 0.216 | 0.600 | 0.388 | 0.662 | 0.380 | 0.836 | 0.814 | 0.932 | 0.510 |
| Surface Bottleneck No Expl Lm | 0.492 | 0.454 | 0.330 | 0.216 | 0.600 | 0.356 | 0.662 | 0.362 | 0.836 | 0.814 | 0.932 | 0.510 |

### Per-Claim Swap Influence (Final Epoch)

Swap influence for head *k*: `(correct_own−correct_swapped) / n_pairs` where
pairs are examples that *differ* on that head's label. Range [−1, 1]; higher
means the head distinguishes own vs. swapped inputs. `val_swap_influence` is
the **macro-average** across all 12 heads.

| Variant | time_complexity | space_complexity | best_case_time | algorithm_class | loop_structure | key_operation | access_pattern | auxiliary_structures | mutates_input | correctness_status | handles_empty_input | handles_duplicates | **Macro-Avg** |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Consistency Loss | 0.933 | 1.000 | 1.000 | 1.000 | 0.867 | 1.000 | 1.000 | 0.933 | 1.000 | 1.000 | 1.000 | 1.000 | **0.978** |
| No Consistency Loss | -0.133 | 0.000 | 0.000 | 0.000 | 0.133 | -0.067 | 0.000 | -0.267 | 0.067 | 0.133 | -0.133 | -0.133 | **-0.033** |
| Claim Only Pooling | -0.200 | 0.533 | 0.267 | -0.200 | -0.067 | 0.333 | 0.133 | 0.000 | 0.467 | -0.133 | 0.000 | 0.333 | **0.122** |
| Random Label Consistency | -0.133 | 0.400 | 0.267 | -0.133 | -0.067 | 0.000 | 0.067 | -0.200 | -0.067 | -0.200 | -0.200 | 0.000 | **-0.022** |
| No Claim To Claim Attention | 0.933 | 1.000 | 1.000 | 1.000 | 0.867 | 1.000 | 1.000 | 0.933 | 1.000 | 1.000 | 1.000 | 1.000 | **0.978** |
| Claims From Explanation Only | 0.933 | 1.000 | 1.000 | 1.000 | 0.867 | 1.000 | 1.000 | 0.933 | 1.000 | 1.000 | 1.000 | 1.000 | **0.978** |
| Surface Bottleneck Consistency | -0.133 | 0.533 | 0.200 | -0.200 | -0.067 | 0.067 | 0.067 | -0.267 | -0.067 | -0.200 | -0.200 | -0.133 | **-0.033** |
| Surface Bottleneck No Expl Lm | -0.133 | 0.533 | 0.133 | -0.200 | -0.067 | 0.200 | 0.067 | -0.200 | -0.067 | -0.200 | -0.200 | -0.133 | **-0.022** |

## 6. Comparison: 3-Claim vs 12-Claim Experiments

*3-claim metrics not available for comparison.*
Pass `--compare-3claim` and ensure 3-claim results exist at
`modal_full_gpt2_outputs/metrics.csv` or the default outputs path.

## 7. Qualitative Examples

Selected examples from the `consistency_loss` variant.

> *No qualitative examples collected.*

## 8. Limitations

1. **Programmatic ground truth**: Claim labels are template-derived, not from
   human annotation. The oracle is correct by construction but limited to the
   template bank (60 hand-annotated functions).

2. **Template diversity**: With 60 templates sampled with replacement, examples
   share the same code across multiple dataset entries (with different explanations).
   A full-scale version should expand the template bank.

3. **12-class imbalance**: Some claim values are rarer (e.g., `O_2n`, `O_log_n`
   time complexity). The model may underfit rare classes.

4. **Smoke run**: A 3-epoch smoke run with 300 examples and a tiny model does not
   allow the model to converge. Treat smoke results as end-to-end sanity checks.

5. **No cross-seed replication**: Single-seed results. Multiple seeds would
   provide variance estimates for significance testing.

## 9. Run Instructions

```bash
# Smoke run (fast, ~2-5 min on CPU):
python run_rich_experiment.py --smoke

# Full run (20 epochs, 3000 examples):
python run_rich_experiment.py --full

# Custom config:
python run_rich_experiment.py --full --model small --epochs 20 --batch 32

# V1 variants only (4 original ablations):
python run_rich_experiment.py --smoke --v1-only

# V2 variants only (4 strict architecture ablations):
python run_rich_experiment.py --smoke --v2-only --output-dir outputs_rich_v2_smoke

# Specific variants only:
python run_rich_experiment.py --smoke --variants consistency_loss no_consistency_loss

# Include 3-claim comparison in report:
python run_rich_experiment.py --full --compare-3claim
```
