Keywords: Probing, Circuit analysis, Understanding high-level properties of models
Other Keywords: Mamba, Associative-Recall
TL;DR: We probe Mamba to identify its associative-recall mechanisms
Abstract: Mamba has recently emerged as a promising alternative to Transformers, demonstrating competitive performance in many language modeling tasks with linear-time computational complexity. Theoretical characterization of Mamba has largely focused on its approximation power in terms of certain target functions (or function classes). However, it remains unclear whether Mamba trained from gradient descent can learn such target functions. As a first step to address this gap, we perform a mechanistic study of Mamba for solving associative recall tasks. By visualizing the learned model weights and the hidden state evolution, we find that trained Mamba models can learn the target associations and identify the key associative recall mechanisms. We complement our study with theoretical analysis on the optimization dynamics of Mamba that give rise to such mechanisms.
Submission Number: 198
Loading