Keywords: Domain Adaptation, Lifelong Learning, Replay Loss, Knowledge Distillation, Stability Plasticity Dilemma
Abstract: Although continuous unsupervised domain adaptation (CUDA) has shown success in dealing with non-stationary data, catastrophic forgetting is still a challenge hindering its full potential. The current state-of-the-art (SOTA) focuses on training a single model to simultaneously perform adaptation (e.g., domain alignment) and knowledge retention (i.e., minimizing replay loss). However, the two conflicting objectives result in a hyper-parameter, which is difficult to tune yet significantly affecting model performance. Therefore, we propose to use two separate models so that one model is dedicated to the retention of historical knowledge (i.e., high stability) while the other to the adaptation to future domains (i.e., high plasticity). This allows the algorithm to forget to achieve better overall performance: dubbed as Forget to Learn (F2L), Specifically, F2L decomposes the training process into specialist model and generalist model, and uses knowledge distillation to transfer knowledge between the two models. We demonstrate the superiority of F2L compared to current CUDA trends (i.e., multi-task learning and single-task constrained learning) on different continuous unsupervised domain adaptation datasets.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
33 Replies
Loading