Mechanistic Capability Probes as a Cheap Screen for Sequence-Mixer Architectures

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Mech Interp Workshop ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Applications of interpretability
Other Keywords: mechanistic interpretability, architecture screening, sequence-mixer architectures
TL;DR: We find that a 1M-parameter sweep on synthetic mechanistic capability probes ranks dense sequence-mixer architectures the same way a 150M or 1B pretraining run would.
Abstract: Comparing sequence-mixer architectures at the scale where their behavior matters ($\geq$1B parameters) costs multiple GPU-days per run, beyond reach of most academic labs. We propose a battery of mechanistic capability probes (induction, associative recall, copy, finite-state tracking, parity, and others) as a cheap behavioral screen for dense sequence-mixer architectures, and ask whether aggregate suite accuracy predicts downstream language-model training cross-entropy. On a held-out set of four architectures at 150M parameters we find Spearman $\rho = -0.80$ and Pearson $r = -0.97$; the screen is robust to dropping any single task family; the small-scale ranking direction is preserved at 1B on the two architectures we ran. Per-task profiles motivate **Hydra**, a multi-head block that places attention, STU, and Mamba mixers as parallel heads within each layer; Hydra matches or beats a parameter-matched 1B OLMo-2 attention baseline on training cross-entropy and on a majority of zero-shot benchmarks.
Submission Number: 637
Loading