Keywords: Primacy, Recency, Mamba, State-space Models
Abstract: We uncover a sparse subset of channels in Mamba's selective state-space block that serves as a substrate for early-input retention. Identified through structured recall tasks, ablating these channels selectively degrades early positional recall. Input periodicity systematically shifts Mamba’s discretization gate, amplifying the “lost-in-the-middle” effect by reallocating information across positions. Primacy and periodicity-driven effects, combined with recency, yield the characteristic U-shaped recall curve, aligning with effects known in Transformers but underexplored in state-space models. We further examine how distractor tokens affect Mamba’s temporal dynamics: recency, sustained by an exponential-decay mechanism, collapses under distraction as it moves the queried items deeper in the sequence. Finally, we demonstrate that the same sparse subset of channels transfers beyond recall. Intervening on them degrades the performance on downstream long-context understanding tasks, indicating that they function as data-agnostic long-term memory carriers. These results provide a common mechanistic picture of Mamba’s temporal profile, linking primacy, recency, and input periodicity.
Primary Area: interpretability and explainable AI
Submission Number: 7479
Loading