Bandit-MoE: Diverse Knowledge Acquisition through Bandit Routing for Continual Learning

16 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mixture-of-Experts, Continual Learning, Multi-Armed Bandits
Abstract: Substantial updates to the parameters of deep learning models constitute a prominent factor underlying catastrophic forgetting in the continual learning. To tackle this challenge, the Mixture-of-Experts (MoE) framework has been introduced into continual learning to leverage its routing strategy to select parts of relevant experts for training, thereby mitigating parameter overwriting. However, in continual learning, the routing strategy tends to allocate tasks to a small number of highly optimized experts trained on prior samples, which results in the overwriting of favored experts while rendering other experts underutilized. Therefore, we formulate expert routing in MoE as a Multi-Armed Bandit problem and propose the Bandit-MoE framework. It consists of a Bandit Routing (BR) strategy and a specific expert structure. BR estimates the maximum expected gain for each expert by incorporating both the expectation and the variance of the reward for the incoming samples. This strategy significantly reduces the early neglect of certain experts and ensures a more balanced expert selection, thereby improving knowledge preservation. Finally, a comprehensive series of experiments are conducted to investigate the impact of expert structures on continual learning. The results of three widely used benchmark datasets have shown that Bandit-MoE consistently outperforms the prior art in all experimental settings, demonstrating the effectiveness of Bandit-MoE for continual learning.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 6606
Loading