Keywords: Foundation Models, Continual Learning, Fairness Under Distribution Shift
Abstract: Foundation models deployed in dynamic environments face continual distributional
shifts and evolving data conditions, where failure to adapt can erode reliability
and fairness. We propose a Query-Only Attention mechanism that discards keys
and values while preserving the inductive bias of full-attention architectures. In
continual learning scenarios, this simplified mechanism significantly mitigates
both loss of plasticity and catastrophic forgetting, outperforming baselines such as
selective re-initialization. Query-Only Attention achieves competitive performance
to full attention while being more compute-efficient. We establish a conceptual link
between query-only attention, full transformer attention, and model agnostic meta
learning, framing them as instances of meta-learning. Finally, through Hessian
spectrum analysis, we show that models maintaining higher curvature rank across
tasks exhibit sustained adaptability, improving trustworthiness under distribution
shift. These findings highlight principles relevant to real-world continual learning
systems that demand reliability, fairness, and accountability.
Submission Number: 15
Loading