Semantic-Aware Contrastive Learning With Proposal Suppression for Video Semantic Role Grounding

Meng Liu, Di Zhou, Jie Guo, Xin Luo, Zan Gao, Liqiang Nie

Published: 01 Jan 2024, Last Modified: 11 Apr 2025IEEE Trans. Circuits Syst. Video Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Video semantic role grounding has gained substantial interest from both the academic and industrial communities. While existing methods have demonstrated considerable performance improvements, the influence of noisy and intra-object proposals, referring to proposals with the same object label, has yet to be explored in video semantic role grounding. In this study, we propose a semantic-aware contrastive learning network with proposal suppression to enhance the accuracy of grounding referenced objects. To fully exploit the semantic information in each semantic role, we introduce a novel semantic role encoding module that allows for precise representations of each semantic role. We also design a semantic-aware proposal suppression network to reduce the impact of noisy proposals on object representation learning. Additionally, we propose a proposal contrastive loss to improve cross-modal alignment and reduce the effect of irrelevant intra-object proposals. Extensive experiments on four datasets demonstrate that our model achieves significant improvements over state-of-the-art methods.