Partially Relaxed Masks for Lightweight Knowledge Transfer without Forgetting in Continual Learning

Tatsuya Konishi; Mori Kurokawa; Roberto Legaspi; Chihiro Ono; Zixuan Ke; Gyuhak Kim; Bing Liu

Partially Relaxed Masks for Lightweight Knowledge Transfer without Forgetting in Continual Learning

Tatsuya Konishi, Mori Kurokawa, Roberto Legaspi, Chihiro Ono, Zixuan Ke, Gyuhak Kim, Bing Liu

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Continual learning, Task similarity, Catastrophic forgetting, Knowledge transfer

Abstract: The existing research on continual learning (CL) has focused mainly on preventing catastrophic forgetting. In the task-incremental learning setting of CL, several approaches have achieved excellent results, with almost no forgetting. The goal of this work is to endow such systems with the additional ability to transfer knowledge among tasks when the tasks are similar and have shared knowledge to achieve higher accuracy. Since the existing system HAT is one of most effective task-incremental learning algorithms, this paper extends HAT with the aim of both objectives, i.e., overcoming catastrophic forgetting and transferring knowledge among tasks without introducing additional mechanisms into the architecture of HAT. The current study finds that task similarity, which indicates knowledge sharing and transfer, can be computed via the clustering of task embeddings optimized by HAT. Thus, we propose a new approach, named “partially relaxed masks” (PRM), to exploit HAT’s masks to not only keep some parameters from being modified in learning subsequent tasks as much as possible to prevent forgetting but also enable remaining parameters to be updated to facilitate knowledge transfer. Extensive experiments demonstrate that PRM performs competitively compared with the latest baselines while also requiring much less computation time.

One-sentence Summary: Our proposed approach exploits task embeddings to identify implicit relationships across tasks with short computation time and achieves both prevention of forgetting and knowledge transfer simultaneously.

Supplementary Material: zip

5 Replies

Loading