Lambda-Skip Connections: the architectural component that prevents Rank Collapse

Federico Arangath Joseph; Jerome Sieber; Melanie Zeilinger; Carmen Amo Alonso

Lambda-Skip Connections: the architectural component that prevents Rank Collapse

Federico Arangath Joseph, Jerome Sieber, Melanie Zeilinger, Carmen Amo Alonso

Published: 22 Jan 2025, Last Modified: 13 Feb 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Rank Collapse, Skip Connections, Sequence Modeling Architectures

Abstract: Rank collapse, a phenomenon where embedding vectors in sequence models rapidly converge to a uniform token or equilibrium state, has recently gained at- tention in the deep learning literature. This phenomenon leads to reduced expres- sivity and potential training instabilities due to vanishing gradients. Empirical ev- idence suggests that architectural components like skip connections, LayerNorm, and MultiLayer Perceptrons (MLPs) play critical roles in mitigating rank collapse. While this issue is well-documented for transformers, alternative sequence mod- els, such as State Space Models (SSMs), which have recently gained prominence, have not been thoroughly examined for similar vulnerabilities. This paper extends the theory of rank collapse from transformers to SSMs using a unifying frame- work that captures both architectures. We introduce a modification in the skip connection component, termed lambda-skip connections, that provides guaran- tees for rank collapse prevention. We present, via analytical results, a sufficient condition to achieve the guarantee for all of the aforementioned architectures. We also study the necessity of this condition via ablation studies and analytical exam- ples. To our knowledge, this is the first study that provides a general guarantee to prevent rank collapse, and that investigates rank collapse in the context of SSMs, offering valuable understanding for both theoreticians and practitioners. Finally, we validate our findings with experiments demonstrating the crucial role of archi- tectural components in preventing rank collapse.

Primary Area: learning theory

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9918

Loading