Memory-Efficient Training with In-Place FFT Implementation

Xinyu Ding; Bangtian Liu; Siyu Liao; Zhongfeng Wang

Memory-Efficient Training with In-Place FFT Implementation

Xinyu Ding, Bangtian Liu, Siyu Liao, Zhongfeng Wang

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: FFT, Inplace, Memory

Abstract: Fast Fourier Transforms (FFT) are widely used to reduce memory and computational costs in deep learning. However, existing implementations, including standard FFT and real FFT (rFFT), cannot achieve true in-place computation. In particular, rFFT maps an input of size $n$ to a complex output of size $\frac{n}{2}+1$, causing dimensional mismatch and requiring additional memory allocation. We propose the first real-domain, fully in-place FFT framework (rdFFT) that preserves input-output dimensional consistency ($n \rightarrow n$). By leveraging butterfly operation symmetry and conjugate properties in the frequency domain, we design an implicit complex encoding scheme that eliminates intermediate cache usage entirely. Theoretically, our method reduces memory usage by 50\% compared to rFFTs. Moreover, it enables zero-cache parameter updates by utilizing the derivative property of the Fourier transform to compute matrix inverses efficiently without intermediate storage. Experiments on multiple natural language understanding tasks demonstrate the method’s effectiveness in maintaining model performance while significantly lowering memory overhead, offering a promising direction for frequency-domain lightweight adaptation.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 3416

Loading