Training Consistent Mixture-of-Experts-Based Prompt Generator for Continual Learning

Yue Lu; Shizhou Zhang; De Cheng; Guoqiang Liang; Yinghui Xing; Nannan Wang; Yanning Zhang

Training Consistent Mixture-of-Experts-Based Prompt Generator for Continual Learning

Yue Lu, Shizhou Zhang, De Cheng, Guoqiang Liang, Yinghui Xing, Nannan Wang, Yanning Zhang

Published: 01 Jan 2025, Last Modified: 21 Jul 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Visual prompt tuning-based continual learning (CL) methods have shown promising performance in exemplar-free scenarios, where their key component can be viewed as a prompt generator. Existing approaches generally rely on freezing old prompts, slow updating and task discrimination for prompt generators to preserve stability and minimize forgetting. In contrast, we introduce a novel approach that trains a consistent prompt generator to ensure stability during CL. Consistency means that for any instance from an old task, its corresponding instance-ware prompt generated by the prompt generator remains consistent even as the generator continually updates in a new task. This ensures that the representation of a specific instance remains stable across tasks and thereby prevents forgetting. We employ a mixture of experts (MoE) as the prompt generator, which contains a router and multiple experts. By deriving conditions sufficient to achieve the consistency for the MoE prompt generator, we demonstrate that: during training in a new task, if the router and experts update in the directions orthogonal to the subspaces spanned by old input features and gating vectors, respectively, the consistency can be theoretically guaranteed. To implement this orthogonality, we project parameter gradients to those orthogonal directions using the orthogonal projection matrices computed via the null space method. Extensive experiments on four class-incremental learning benchmarks validate the effectiveness and superiority of our approach.

Loading