Keywords: Rank Collapse, Skip Connections, Sequence Modeling Architectures
Abstract: Rank collapse, a phenomenon where embedding vectors in sequence models
rapidly converge to a uniform token or equilibrium state, has recently gained at-
tention in the deep learning literature. This phenomenon leads to reduced expres-
sivity and potential training instabilities due to vanishing gradients. Empirical ev-
idence suggests that architectural components like skip connections, LayerNorm,
and MultiLayer Perceptrons (MLPs) play critical roles in mitigating rank collapse.
While this issue is well-documented for transformers, alternative sequence mod-
els, such as State Space Models (SSMs), which have recently gained prominence,
have not been thoroughly examined for similar vulnerabilities. This paper extends
the theory of rank collapse from transformers to SSMs using a unifying frame-
work that captures both architectures. We introduce a modification in the skip
connection component, termed lambda-skip connections, that provides guaran-
tees for rank collapse prevention. We present, via analytical results, a sufficient
condition to achieve the guarantee for all of the aforementioned architectures. We
also study the necessity of this condition via ablation studies and analytical exam-
ples. To our knowledge, this is the first study that provides a general guarantee to
prevent rank collapse, and that investigates rank collapse in the context of SSMs,
offering valuable understanding for both theoreticians and practitioners. Finally,
we validate our findings with experiments demonstrating the crucial role of archi-
tectural components in preventing rank collapse.
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9918
Loading