Query-Only Attention for Trustworthy Continual Adaptation

Published: 08 Nov 2025, Last Modified: 18 Nov 2025ResponsibleFM @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Foundation Models, Continual Learning, Fairness Under Distribution Shift
Abstract: Foundation models deployed in dynamic environments face continual distributional shifts and evolving data conditions, where failure to adapt can erode reliability and fairness. We propose a Query-Only Attention mechanism that discards keys and values while preserving the inductive bias of full-attention architectures. In continual learning scenarios, this simplified mechanism significantly mitigates both loss of plasticity and catastrophic forgetting, outperforming baselines such as selective re-initialization. Query-Only Attention achieves competitive performance to full attention while being more compute-efficient. We establish a conceptual link between query-only attention, full transformer attention, and model agnostic meta learning, framing them as instances of meta-learning. Finally, through Hessian spectrum analysis, we show that models maintaining higher curvature rank across tasks exhibit sustained adaptability, improving trustworthiness under distribution shift. These findings highlight principles relevant to real-world continual learning systems that demand reliability, fairness, and accountability.
Submission Number: 15
Loading