Towards Lifelong Video Understanding: A Survey on Continual Learning in Video Visual Question Answering
Abstract: This paper surveys the application of continual learning in Video Visual Question Answering (Video VQA) to advance lifelong video understanding. With the rapid progress in VQA technologies, models perform excellently in static environments but face significant challenges in real-world scenarios, particularly catastrophic forgetting when encountering new tasks or domains. We systematically review the fundamentals of video VQA, including the evolution from image to video, core architectures, and evaluation methods, and thoroughly explore how continual learning techniques are adapted to the video understanding domain. We analyze implementation strategies based on regularization, replay, parameter isolation, and hybrid methods, comparing their performance across different video VQA task streams. The paper discusses experimental evaluation frameworks, spanning task division (by question type, domain, and video style), training protocols, and baseline model selection (joint training, sequential fine-tuning, and independent training). Additionally, we identify current challenges such as long video understanding, modality imbalance, and computational efficiency concerns, while exploring future research directions and potential application scenarios. This survey aims to integrate recent advances, highlight critical trends, and provide guidance for the development of continual video VQA learning.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: Continual learning, video vqa, vqa, survey
Contribution Types: Surveys
Languages Studied: English
Submission Number: 731
Loading