============================================================
COMPREHENSIVE ANALYSIS REPORT
============================================================

## Model Comparison

### bert-base-uncased
  noise_robustness_results_char_swap_0.05: 0.7991 ± 0.1309
  noise_robustness_results_char_swap_0.1: 0.6180 ± 0.1166
  noise_robustness_results_char_swap_0.2: 0.4740 ± 0.1041
  noise_robustness_results_word_sub_0.05: 0.9648 ± 0.0585
  noise_robustness_results_word_sub_0.1: 0.9584 ± 0.0563
  noise_robustness_results_word_sub_0.2: 0.9654 ± 0.0413
  noise_robustness_results_grammar_0.05: 0.9969 ± 0.0056
  noise_robustness_results_grammar_0.1: 0.9969 ± 0.0056
  noise_robustness_results_grammar_0.2: 0.9969 ± 0.0056

### roberta-base
  noise_robustness_results_char_swap_0.05: 0.9849 ± 0.0069
  noise_robustness_results_char_swap_0.1: 0.9649 ± 0.0101
  noise_robustness_results_char_swap_0.2: 0.9487 ± 0.0122
  noise_robustness_results_word_sub_0.05: 0.9985 ± 0.0027
  noise_robustness_results_word_sub_0.1: 0.9978 ± 0.0026
  noise_robustness_results_word_sub_0.2: 0.9984 ± 0.0021
  noise_robustness_results_grammar_0.05: 0.9996 ± 0.0007
  noise_robustness_results_grammar_0.1: 0.9996 ± 0.0007
  noise_robustness_results_grammar_0.2: 0.9996 ± 0.0007

## Circuit Patterns

### bert-base-uncased
  Detection layers: [(0, 10), (1, 2), (10, 2)]
  Correction layers: [(3, 1), (4, 1), (5, 1)]

### roberta-base
  Detection layers: [(5, 1), (7, 1), (11, 1)]
  Correction layers: [(2, 1), (3, 1), (4, 1)]

## Meta-Analysis

roberta-base_grammar_0.05:
  Pooled mean: 0.9996
  N experiments: 1

roberta-base_grammar_0.1:
  Pooled mean: 0.9996
  N experiments: 1

roberta-base_grammar_0.2:
  Pooled mean: 0.9996
  N experiments: 1

roberta-base_word_sub_0.05:
  Pooled mean: 0.9985
  N experiments: 1

roberta-base_word_sub_0.2:
  Pooled mean: 0.9984
  N experiments: 1

roberta-base_word_sub_0.1:
  Pooled mean: 0.9978
  N experiments: 1

bert-base-uncased_grammar_0.05:
  Pooled mean: 0.9969
  N experiments: 1

bert-base-uncased_grammar_0.1:
  Pooled mean: 0.9969
  N experiments: 1

bert-base-uncased_grammar_0.2:
  Pooled mean: 0.9969
  N experiments: 1

roberta-base_char_swap_0.05:
  Pooled mean: 0.9849
  N experiments: 1

## Key Insights
1. Most robust model overall: roberta-base (avg: 0.988)
2. bert-base-uncased: Primary detection in layers 0, 1
3. roberta-base: Primary detection in layers 5, 7
4. bert-base-uncased maintains 0.997 robustness with grammar_0.05
5. roberta-base maintains 1.000 robustness with grammar_0.05