Neural Fields Meet Attention

Published: 23 Sept 2025, Last Modified: 29 Oct 2025NeurReps 2025 ProceedingsEveryoneRevisionsBibTeXCC BY 4.0
Keywords: neural fields, continuous representation, attention, softmax, gradient descent
TL;DR: This paper proves that the attention mechanism in Transformers is mathematically equivalent to performing gradient-based optimization on a neural field, revealing that Transformers possess intrinsic properties for learning continuous functions.
Abstract: We establish a precise mathematical connection between neural field optimization and Transformer attention mechanisms. First, we prove that Transformer-based operators learning neural fields are equivariant to affine transformations (translations and positive scalings) when equipped with relative positional encodings and explicit coordinate normalization---extending geometric deep learning to meta-learning of continuous functions. Second, we demonstrate that linear attention exactly computes negative gradients of squared-error loss for sinusoidal neural fields, with softmax attention converging to this identity at rate $O(\tau^{-2})$ in the high-temperature limit. Experiments on rotation groups validate our theory: equivariance errors remain below $10^{-5}$ across SO(2) and SO(3) transformations (mean $3.6 \times 10^{-6}$, 10 seeds), while attention-gradient correlation exceeds 0.999 for temperatures $\tau \geq 100$. These results reveal that attention mechanisms implicitly encode geometric priors suited for continuous function learning.
Submission Number: 157
Loading