How does Mamba Perform Associative Recall? A Mechanistic Study

Grégoire LE CORRE; Ningyuan Huang; Alberto Bietti

How does Mamba Perform Associative Recall? A Mechanistic Study

Grégoire LE CORRE, Ningyuan Huang, Alberto Bietti

Published: 30 Sept 2025, Last Modified: 30 Sept 2025Mech Interp Workshop (NeurIPS 2025) PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Probing, Circuit analysis, Understanding high-level properties of models

Other Keywords: Mamba, Associative-Recall

TL;DR: We probe Mamba to identify its associative-recall mechanisms

Abstract: Mamba has recently emerged as a promising alternative to Transformers, demonstrating competitive performance in many language modeling tasks with linear-time computational complexity. Theoretical characterization of Mamba has largely focused on its approximation power in terms of certain target functions (or function classes). However, it remains unclear whether Mamba trained from gradient descent can learn such target functions. As a first step to address this gap, we perform a mechanistic study of Mamba for solving associative recall tasks. By visualizing the learned model weights and the hidden state evolution, we find that trained Mamba models can learn the target associations and identify the key associative recall mechanisms. We complement our study with theoretical analysis on the optimization dynamics of Mamba that give rise to such mechanisms.

Submission Number: 198

Loading