Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Video Moment Retrieval; Video and Language;
TL;DR: We propose BM-DETR to tackle weak visual-textual alignment, effectively identifying relevant visual features via proposed background moment dection. BM-DETR has demonstrated superior performance and generalization ability on various VMR datasets..
Abstract: Video moment retrieval (VMR) identifies a specific moment in an untrimmed video for a given natural language query. This task is prone to suffer the weak visual-textual alignment problem innate in the video datasets. Due to the ambiguity, a query does not fully cover the relevant details of the corresponding moment, or the moment may contain misaligned and irrelevant frames, potentially limiting further performance gains and generalization capability. To tackle this problem, we propose a background-aware moment detection transformer (BM-DETR). Our model adopts a contrastive approach, carefully utilizing the negative queries matched to other moments in the video. Specifically, our model learns to predict the target moment from the joint probability of each frame given the positive query and the complement of negative queries. This leads to efficient and effective use of the surrounding background, improving moment sensitivity and enhancing overall alignments in videos. Our approach is efficient and outperforms previous methods, including contrastive learning-based, on multiple datasets with significantly reduced computational costs.
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3199
Loading