Scalable Continual Learning: Adaptive MoEs for Expanding Task Sets

Adrian Candocia; Omer Mustafa Inan; Raaghav Agarwal; Aamod Varma; Mark A. Davenport

Scalable Continual Learning: Adaptive MoEs for Expanding Task Sets

Adrian Candocia, Omer Mustafa Inan, Raaghav Agarwal, Aamod Varma, Mark A. Davenport

Published: 05 Mar 2025, Last Modified: 09 Apr 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0

Track: tiny / short paper (up to 4 pages)

Keywords: Mixture-of-Experts, Continual Learning, sparsity

TL;DR: We implement an Adaptive Mixture-of-Experts to grow the number of experts as more tasks are introduced to the task stream.

Abstract: Recently, the Mixture-of-Experts (MoE) model has been shown to be an effective strategy for continual learning because it can adapt to a range of tasks by employing an array of "experts'' that each specialize on certain tasks. However, the MoE model lacks the ability to adapt to completely new tasks, particularly as the number of tasks grows to be large. In this work we develop a framework for expanding the number of experts as needed when new tasks arise. We also provide simulations demonstrating that our approach can effectively handle a growing number of tasks.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 62

Loading