Keywords: FFT, Inplace, Memory
Abstract: Fast Fourier Transforms (FFT) are widely used to reduce memory and computational costs in deep learning. However, existing implementations, including standard FFT and real FFT (rFFT), cannot achieve true in-place computation. In particular, rFFT maps an input of size $n$ to a complex output of size $\frac{n}{2}+1$, causing dimensional mismatch and requiring additional memory allocation.
We propose the first real-domain, fully in-place FFT framework (rdFFT) that preserves input-output dimensional consistency ($n \rightarrow n$). By leveraging butterfly operation symmetry and conjugate properties in the frequency domain, we design an implicit complex encoding scheme that eliminates intermediate cache usage entirely.
Theoretically, our method reduces memory usage by 50\% compared to rFFTs. Moreover, it enables zero-cache parameter updates by utilizing the derivative property of the Fourier transform to compute matrix inverses efficiently without intermediate storage. Experiments on multiple natural language understanding tasks demonstrate the method’s effectiveness in maintaining model performance while significantly lowering memory overhead, offering a promising direction for frequency-domain lightweight adaptation.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 3416
Loading