TR-Adapter: Parameter-Efficient Transfer Learning for Video Question Answering

Yuanyuan Wang, Meng Liu, Xuemeng Song, Liqiang Nie

Published: 2025, Last Modified: 25 Mar 2026IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, the use of large-scale pre-trained models for vision-language tasks has gained significant attention and has shown promising results in the video question answering. However, the increasing size of these models has made the fully fine-tuning strategy impractical. Therefore, there is a growing need for research in parameter-efficient transfer learning for downstream tasks. To address this challenge, we introduce a novel parameter-efficient transfer learning technique based on a temporal reasoning adapter for the video question answering task. Our proposed approach captures the temporal relationship within videos, enabling the model to possess visual reasoning ability and knowledge acquisition ability from language models. Our extensive experiments on four video question answering datasets indicate that our method can match or even outperform fully fine-tuning strategies and state-of-the-art models, while having the advantage of parameter efficiency.
Loading