Keywords: Machine Learning, Mechanistic Interpretability, Mamba, State Space Models, Large Language Models, ICML
TL;DR: In Mamba, we provide evidence that Layer 39 is a key bottleneck in the IOI task, that convs of Layer 39 shift names one position forward, and show a simple technique that allows us to write to the representations in layer 39's SSM block.
Abstract: How much will interpretability techniques developed now generalize to future models? A good case study is Mamba, a recent recurrent architecture with scaling comparable to Transformers. We adapt pre-Mamba techniques to Mamba, and partially reverse engineer the circuit responsible for the Indirect Object Identification (IOI) task. The techniques provide evidence that 1) Layer 39 is a key bottleneck, 2) Convs of Layer 39 shift names one position forward, and 3) The name entities are stored linearly in Layer 39's SSM. Finally, we adapt an automatic circuit discovery tool, positional Edge Attribution Patching, to identify a Mamba IOI circuit. Our contributions provide initial evidence that circuit-based mechanistic interpretability tools work well for the Mamba architecture.
Submission Number: 43
Loading