Linear-Time Sequence Modeling with MLPs

Chenwei Cui; Zehao Yan; Gedeon Muhawenayo; Hannah Kerner

Linear-Time Sequence Modeling with MLPs

Chenwei Cui, Zehao Yan, Gedeon Muhawenayo, Hannah Kerner

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: All-MLP, Sequence Modeling, Multilayer Perceptron, Transformer

TL;DR: We present Causal Relation Networks (CausalRNs), the first all-MLP sequence modeling architecture with linear-time parallel training.

Abstract: We present Causal Relation Networks (CausalRNs), the first all-MLP sequence modeling architecture with linear-time parallel training. To enable autoregressive modeling, we made Relation Networks (RNs) equivariant and causal through relaxation and masking. Contrary to the earlier belief that RNs are quadratic-time, we show that when using exp(x) as the activation function, any RN is linear-time, fully parallelizable, and numerically stable. Our derivation spontaneously gave rise to familiar design choices adopted by state-of-the-art architectures, e.g. exponential gating and state expansion. Such duality provided a new perspective, from which we not only validated popular design choices, but also discovered new design considerations. Experiments on autoregressive language modeling and image classification showed CausalRNs to be comparable to Linear Transformers. The quadratic variant of CausalRNs achieved perfect retrieval on the copying task, which was previously only possible with Transformers.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13780

Loading