FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression

Yifei Gao; Yong Chen; Chen Zhang

FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression

Yifei Gao, Yong Chen, Chen Zhang

Published: 18 Sept 2025, Last Modified: 21 Apr 2026NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Functional Attention; Function-on-Function Regression;Bidirectional NCDE;Mixture-of-Experts

TL;DR: Introduce a functional attention mechanism that leverages bidirectional NCDEs and MoE to model function-on-function regression.

Abstract: Functional data play a pivotal role across science and engineering, yet their infinite-dimensional nature makes representation learning challenging. Conventional statistical models depend on pre-chosen basis expansions or kernels, limiting the flexibility of data-driven discovery, while many deep-learning pipelines treat functions as fixed-grid vectors, ignoring inherent continuity. In this paper, we introduce Functional Attention with a Mixture-of-Experts (FAME), an end-to-end, fully data-driven framework for function-on-function regression. FAME forms continuous attention by coupling a bidirectional neural controlled differential equation with MoE-driven vector fields to capture intra-functional continuity, and further fuses change to inter-functional dependencies via multi-head cross attention. Extensive experiments on synthetic and real-world functional regression benchmarks show that FAME achieves state-of-the-art accuracy and strong robustness to arbitrarily sampled discrete observations of functions.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 14569

Loading