Gated Delta Networks: Improving Mamba2 with Delta Rule

ICLR 2025 Conference Submission90 Authors

13 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: linear RNN, state-space model, linear transformer, subquadractic model, linear attention, delta rule, mamba
TL;DR: We introduce Gated DeltaNet, which combines the gating mechanism from Mamba2 with the delta rule from DeltaNet, achieving superior performance compared to both models individually.
Abstract: Linear Transformers have emerged as efficient alternatives to standard Transformers due to their inference efficiency, achieving competitive performance across various tasks, though they often struggle with recall-intensive tasks. Recently, two mechanisms—the gating mechanism and the delta update rule—have been used to enhance linear Transformers. We found these two mechanisms to be complementary: the gating mechanism enables fast, adaptive memory erasure, while the delta rule allows for more precise and targeted memory updates. In this work, we introduce the gated delta rule, which combines both mechanisms, and extend the delta rule's parallel algorithm to incorporate gating. Our experiments demonstrate that linear Transformers with the gated delta rule, dubbed Gated DeltaNet, consistently outperform Mamba2 (a gated linear transformer) and DeltaNet in language modeling, common sense reasoning, and real-world in-context recall-intensive tasks. Additionally, we explore hybrid models that combine Gated DeltaNet layers with sliding window attention or Mamba2 layers, further enhancing retrieval capabilities.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 90
Loading