Catastrophic Token Leakage in Convolutional Sequence Modeling and How to Mitigate It

Naman Agarwal; Evan Dogariu; Xinyi Chen; Daniel Suo; Vladimir Feinberg; Elad Hazan

Catastrophic Token Leakage in Convolutional Sequence Modeling and How to Mitigate It

Naman Agarwal, Evan Dogariu, Xinyi Chen, Daniel Suo, Vladimir Feinberg, Elad Hazan

26 Sept 2024 (modified: 02 Oct 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Convolutional Models, LLMs, Language Modeling, Fast Fourier Transforms

TL;DR: FFT based Convolutional models break causality due to numerical imprecision. Integer based FFTs can be a solution.

Abstract: Sequence models based on long convolutions have recently gained significant interest as an alternative to attention models in large scale language modeling due to fast training and inference enabled via the Fast Fourier Transform (FFT). Our work begins with an observation that sequence models based on FFT-based convolutions can have catastrophic leaking of future tokens. This striking failure of causality occurs due to numerical errors in the standard FFT. We provide a solution to the problem via Number-Theoretic FFTs which are executed solely on integers. Our method provably ensures no token leakage, providing a safe primitive for convolutional models in general. To align with current deep learning practice we provide a complete implementation using 32bit integers and leveraging standard integer matrix multiplications.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Resubmission: No

Student Author: No

Large Language Models: No, not at all.

Submission Number: 7802

Loading