Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing

Peihao Wang; Ruisi Cai; Yuehao Wang; Jiajun Zhu; Pragya Srivastava; Zhangyang Wang; Pan Li

Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing

Peihao Wang, Ruisi Cai, Yuehao Wang, Jiajun Zhu, Pragya Srivastava, Zhangyang Wang, Pan Li

Published: 22 Jan 2025, Last Modified: 03 Mar 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: State Space Models, Long-context LLMs, Over-smoothing

TL;DR: We theoretically and empirically reveal and tackle locality and over-smoothing bottlenecks for state space models.

Abstract: Structured State Space Models (SSMs) have emerged as alternatives to transformers. While SSMs are often regarded as effective in capturing long-sequence dependencies, we rigorously demonstrate that they are inherently limited by strong recency bias. Our empirical studies also reveal that this bias impairs the models' ability to recall distant information and introduces robustness issues. Our scaling experiments then discovered that deeper structures in SSMs can facilitate the learning of long contexts. However, subsequent theoretical analysis reveals that as SSMs increase in depth, they exhibit another inevitable tendency toward over-smoothing, e.g., token representations becoming increasingly indistinguishable. This *fundamental dilemma* between recency and over-smoothing hinders the scalability of existing SSMs. Inspired by our theoretical findings, we propose to *polarize* two channels of the state transition matrices in SSMs, setting them to zero and one, respectively, simultaneously addressing recency bias and over-smoothing. Experiments demonstrate that our polarization technique consistently enhances the associative recall accuracy of long-range tokens and unlocks SSMs to benefit further from deeper architectures. All source codes are released at https://github.com/VITA-Group/SSM-Bottleneck.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1195

Loading