Contextual Sparsity as a Tool for Mechanistic Understanding of Retrieval in Hybrid Foundation Models
Track: tiny / short paper (up to 4 pages)
Keywords: contextual sparsity, hybrid models, state space models, mechanistic interpretability
TL;DR: We adaptively prune attention heads to analyze the retrieval ability of hybrid foundation models
Abstract: We mechanistically investigate the role of self-attention in hybrid foundation models that combine state-space modules with self-attention. Evaluating the RecurrentGemma-2B model on a synthetic needle-in-a-haystack task, we show that completely deactivating attention heads causes a total retrieval failure—even though overall generation quality is only modestly affected. Using a contextual sparsity approach inspired by Liu et al. (2023), we find that retaining only 2 out of 10 attention heads is sufficient to nearly preserve full retrieval performance. These findings highlight a specialized function of self-attention for copying and retrieval, suggesting that future work could focus on designing dedicated, interpretable retrieval mechanisms within hybrid architectures.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 87
Loading