Abstract: Continual learning is a promising alternative to the current pretrain-and-finetune paradigm: It aims to learn a model on a sequence of tasks without forgetting knowledge from preceding tasks. We investigate continual learning for Visual Question Answering and show that performance highly depends on task design, order, and similarity - where tasks may be formulated according to either modality. Our results suggest that incremental learning of language reasoning skills (such as questions about color, count etc.) is more difficult than incrementally learning visual categories. We show that this difficulty is related to task similarity, where heterogeneous tasks lead to more severe forgetting. We also demonstrate that naive finetuning of pretrained models is insufficient, and recent continual learning approaches can reduce forgetting by more than 20%. We propose a simple yet effective Pseudo-Replay algorithm, which improves results while using less memory compared to standard replay. Finally, to measure gradual forgetting we introduce a new metric that takes into account the semantic similarity of predicted answers.
Paper Type: long
0 Replies
Loading