Interpretable Transformer Regression for Functional and Longitudinal Covariates

ICLR 2026 Conference Submission22049 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dual-attention Transformer, interpretable attention, functional data, longitudinal data, irregular sampling plan
Abstract: We consider scalar-on-function prediction from functional covariates that may be measured sparsely and irregularly over time with noise, which is common in longitudinal studies. We propose a dual‑attention Transformer that operates on a discretized time grid with missing‑value masks and trains end‑to‑end without any imputation. The model couples time‑point attention, which encodes local and long‑range temporal structure, with inter‑sample attention, which shares information across similar subjects. We derive prediction error bounds and prove consistency under a random‑effects framework that accommodates sparse/irregular sampling, measurement error, and label noise. In simulations across varying sparsity levels, our method outperforms 19 strong baselines (ensemble, statistical/functional, deep learning methods, tabular Transformers, and pre-trained models such as TabPFN) in regimes with $\leq 50$\% observations and remains competitive in denser settings, highlighting the importance of end‑to‑end missingness‑aware modeling. The learned attention weights are interpretable, revealing predictive time windows and cluster structure. In real‑world data, our approach achieves the best prediction and classification performance, surpassing leading imputation methods paired with competitive learners. This underscores that explicitly modeling sparsity is preferable. In summary, the dual‑attention mechanism is interpretable, consistently identifying predictive time windows and cohort clusters that align with domain knowledge. The proposed Transformer also outperforms state‑of‑the‑art methods while preserving robustness and interpretability.
Supplementary Material: pdf
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 22049
Loading