Dynamic Mixture-of-Experts for Incremental Graph Learning

Lecheng Kong; Theodore Vasiloudis; Seongjun Yun; Han Xie; Xiang song

Dynamic Mixture-of-Experts for Incremental Graph Learning

Lecheng Kong, Theodore Vasiloudis, Seongjun Yun, Han Xie, Xiang song

24 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Graph neural networks; Incremental Learning

TL;DR: We propose a mixture-of-experts GNN for graph incremental learning, where each expert is dedicated to a data block for effective dynamic data modeling.

Abstract: Graph incremental learning is a learning paradigm that aims to adapt models trained on previous data to continuously incremented data or tasks over time without the need for retraining on the full dataset. However, regular graph machine learning methods suffer from catastrophic forgetting when applied to incremental learning settings, where previously learned knowledge is overridden by new knowledge. Previous approaches have tried to address this by treating the previously trained model as an inseparable unit and using regularization, experience replay, and parameter isolation to maintain old behaviors while learning new knowledge. These approaches, however, do not account for the fact that not all previously acquired knowledge is equally beneficial for learning new tasks, and maintaining all previous knowledge and the latest knowledge in a single model is ineffective. Some prior patterns can be transferred to help learn new data, while others may deviate from the new data distribution and be detrimental. To address this, we propose a dynamic mixture-of-experts (DyMoE) approach for incremental learning. Specifically, a DyMoE GNN layer adds new expert networks specialized in modeling the incoming data blocks. We design a customized regularization loss that utilizes data sequence information so existing experts can maintain their ability to solve old tasks while helping the new expert learn the new data effectively. As the number of data blocks grows over time, the computational cost of the full mixture-of-experts (MoE) model increases. To address this, we introduce a sparse MoE approach, where only the top-$k$ most relevant experts make predictions, significantly reducing the computation time. Our model achieved 5.47\% relative accuracy increase compared to the best baselines on class incremental learning with minimal computation increase, showing the model's exceptional power.

Primary Area: learning on graphs and other geometries & topologies

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3340

Loading