Abstract: Lifelong deep reinforcement learning (DRL) methods enable continuous adaptation to new tasks and retention of old knowledge. However, these methods often necessitate large model sizes, leading to substantial computational and storage resource requirements during training and inference. Unfortunately, existing research has not yet provided a lightweight solution to address this issue. This work aims to develop a generic method that can be seamlessly integrated into existing lifelong DRL methods to facilitate their achievement of lightweight models while also yielding higher returns. While sparse training (ST) methods have been extensively used in the DRL community to achieve lightweight models, they exacerbate the issue of catastrophic forgetting and compromise generalization when applied in lifelong DRL. To improve generalization, we develop a gradient optimization method that leverages sharpness-aware minimization (SAM) to smooth the gradient surface of the model without introducing excessive computational complexity. In addition, to alleviate catastrophic forgetting and promote model convergence, we introduce a priority-based approach that samples effective past experiences from the replay buffer. Extensive experiments demonstrate that our approach achieves 90% sparsity in five representative lifelong DRL methods while achieving higher episode return and average return (up to 34% improvement) across all episodes compared to the dense models.
Loading