TransFourier: FFT Is All You Need

ICLR 2026 Conference Submission17521 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: FFT, Attention-Free, Autoregressive
TL;DR: We propose TransFourier, an architecture that replaces self-attention with an efficient Fourier Transform module, achieving competitive performance.
Abstract: The scalability of Large Language Models (LLMs) to handle extremely long sequences is hindered by two foundational challenges: the quadratic computational cost of self-attention and the generalization limitations of positional encodings when extrapolating to contexts far beyond their training regime. These factors create bottlenecks for both the efficiency and the effective context window of current models. This paper introduces TransFourier, a novel architecture designed to address these challenges. TransFourier completely replaces the masked self-attention module with a parameter-efficient, $O(L \log L)$ Multi-Head Fourier (MHF) module. Our core contributions are threefold: (1) We propose a model that leverages the Fast Fourier Transform (FFT) for sequence information mixing, inherently addressing the aforementioned computational and generalization bottlenecks of attention. (2) We introduce a novel frequency-domain causal masking technique, which elegantly enforces autoregressive capabilities through asymmetric padding and truncation, overcoming a critical barrier that has historically limited Fourier-based models in generative tasks. (3) Our design is built entirely on highly-optimized, standard deep learning operators (e.g., FFT and convolution), obviating the need for hardware-specific custom CUDA kernels, unlike architectures such as Mamba, thus ensuring broad accessibility and portability. Evaluations on established academic benchmarks show that TransFourier is competitive with mature Transformer and State Space Model (SSM) baselines of comparable size. Given its strong scaling and architectural simplicity, TransFourier presents a compelling and practical pathway toward developing the next generation of efficient long-sequence models. The code is available in the supplementary materials.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17521
Loading