Keywords: Mixture model, Continual Learning
Abstract: Extant studies predominantly address catastrophic forgetting within a simplified continual learning paradigm, typically confined to a singular data domain. Conversely, real-world applications frequently encompass multiple, evolving data domains, wherein models often struggle to retain many critical past information, thereby leading to performance degradation. This paper addresses this complex scenario by introducing a novel dynamic expansion approach called Learning Expandable and Adaptable Representations (LEAR). This framework orchestrates a collaborative backbone structure, comprising global and local backbones, designed to capture both general and task-specific representations. Leveraging this collaborative backbone, the proposed framework dynamically create a lightweight expert to delineate decision boundaries for each novel task, thereby facilitating the prediction process. To enhance new task learning, we introduce a novel Mutual Information-Based Prediction Alignment approach, which incrementally optimizes the global backbone via a mutual information metric, ensuring consistency in the prediction patterns of historical experts throughout the optimization phase. To mitigate network forgetting, we propose a Kullback–Leibler (KL) Divergence-Based Feature Alignment approach, which employs a probabilistic distance measure to prevent significant shifts in critical local representations. Furthermore, we introduce a novel Hilbert-Schmidt Independence Criterion (HSIC)-Based Collaborative Optimization approach, which encourages the local and global backbones to capture distinct semantic information in a collaborative manner, thereby mitigating information redundancy and enhancing model performance. Moreover, to accelerate new task learning, we propose a novel Expert Selection Mechanism that automatically identifies the most relevant expert based on data characteristics. This selected expert is then utilized to initialize a new expert, thereby fostering positive knowledge transfer. This approach also enables expert selection during the testing phase without requring any task information. Empirical results demonstrate that the proposed framework achieves state-of-the-art performance.
Supplementary Material: zip
Primary Area: Optimization (e.g., convex and non-convex, stochastic, robust)
Submission Number: 18965
Loading