A Semi-supervised Learning Approach for Visual Question Answering based on Maximal Correlation

Sikai Yin, Fei Ma, Shao-Lun Huang

Published: 2021, Last Modified: 16 Nov 2023SMC 2021Readers: Everyone

Abstract: In this paper, we propose a semi-supervised learning approach for the Visual Question Answering (VQA) task based on maximal correlation. Instead of training the VQA model with just classification loss like cross-entropy, we propose a semi-supervised loss function to incorporate Soft-HGR, a training approach based on Hirschfeld-Gebelein-Rényi (HGR) maximal correlation, to realize semi-supervised model training. With Soft-HGR, the high-order correlation from cross-modal common information of VQA image-question pairs is utilized to improve VQA model performance even without discriminative supervision from answer labels. We conduct experiments on the VQA v2 dataset by training the VQA model with different percentages of unlabeled samples. Experimental results show that our approach is efficient and model-agnostic for this semi-supervised learning task.

0 Replies