DPP-CL: orthogonal subspace continual learning for dialogue policy planning

Published: 2025, Last Modified: 06 Nov 2025World Wide Web (WWW) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Continual learning (CL) is crucial for enabling large language models (LLMs) to adapt to multiple downstream tasks. However, LLMs often suffer from catastrophic forgetting (CF) during multi-task training, where learning new tasks undermines knowledge acquired from previous ones. Existing studies typically categorize three methods to mitigate CF, including regularization-based, rehearsal-based, and parameter isolation-based methods, whereas they frequently introduce new challenges, such as privacy risks and limited effectiveness on long sequences. To overcome these limitations, we propose DPP-CL, a Dialogue Policy Planner (DPP) used in CL to mitigate CF and improve performance. Specifically, we used gradient descent in an orthogonal subspace to learn new tasks for DPP in multitask learning. Furthermore, to enhance long-sequence understanding, we introduce a hybrid representation that combines hyperbolic spherical embeddings with Euclidean embeddings. More concretely, the DPP is trained in three sequential stages: supervised fine-tuning, knowledge distillation, and reinforcement learning. Experimentally, we evaluate DPP-CL on both standard CL benchmarks and cybersecurity inference benchmarks combined with retrieval-augmented generation, demonstrating significant improvements in mitigating CF and handling complex, long-context scenarios.
Loading