Neural Fractional Attention Differential Equations

Qiyu Kang; Wenjun Cui; Xuhao Li; Yuxin Ma; Xueyang Fu; Wee Peng Tay; Yidong Li; Zheng-Jun Zha

Neural Fractional Attention Differential Equations

Qiyu Kang, Wenjun Cui, Xuhao Li, Yuxin Ma, Xueyang Fu, Wee Peng Tay, Yidong Li, Zheng-Jun Zha

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: neural differential equations

Abstract: The integration of differential equations with neural networks has created powerful tools for modeling complex dynamics effectively across diverse machine learning applications. While standard integer-order neural ordinary differential equations (ODEs) have shown considerable success, they are limited in their capacity to model systems with memory effects and historical dependencies. Fractional calculus offers a mathematical framework capable of addressing this limitation, yet most current fractional neural networks use static memory weightings that cannot adapt to input-specific contextual requirements. This paper proposes a generalized neural Fractional Attention Differential Equation (FADE), which combines the memory-retention capabilities of fractional calculus with contextual learnable attention mechanisms. Our approach replaces fixed kernel functions in fractional operators with neural attention kernels that adaptively weight historical states based on their contextual relevance to current predictions. This allows our framework to selectively emphasize important temporal dependencies while filtering less relevant historical information. Our theoretical analysis establishes solution boundedness, problem well-posedness, and numerical equation solver convergence properties of the proposed model. Furthermore, through extensive evaluation on tasks such as fluid flow, graph learning problems and spatio-temporal traffic flow forecasting, we demonstrate that our adaptive attention-based fractional framework outperforms both integer-order neural ODE models and existing fractional approaches. The results confirm that our framework provides superior modeling capacity for complex dynamics with varying temporal dependencies. The code is available at \url{https://github.com/cuiwjTech/NeurIPS2025_FADE}.

Supplementary Material: zip

Primary Area: General machine learning (supervised, unsupervised, online, active, etc.)

Submission Number: 19675

Loading