Keywords: Associative Memory; In-Context Learning; State-Space Models
Abstract: As sequence models emerge as efficient architectures for long-context modeling, it becomes important to understand whether state-space models are capable of associative recall. We study the recall-predict problem, where a context is a mixture of tagged probability measures and a query specifies the component whose content distribution determines the response. First, focusing on two-layer Mamba models, we introduce the query insertion encoding and show the existence of an infinite-context measure-valued Mamba limit. Under separated tags and exponential decay assumptions, we study trained Mamba hypothesis classes and prove that approximate empirical risk minimization over these classes yields estimators with the population-risk bound in sub-polynomial rate. Finally, we complement this upper bound with an architecture-independent minimax lower bound of comparable order, demonstrating that the exponent is statistically optimal. These results extend measure-level associative-memory theory beyond attention mechanisms and identify query insertion, recurrent stability and spectral effective dimension as the key mechanisms enabling optimal learning from infinite contexts.
Submission Number: 29
Loading