Temporal Self-Paced Proposal Learning for Weakly-Supervised Video Moment Retrieval and Highlight Detection

Published: 01 Jan 2024, Last Modified: 07 Mar 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The Weakly-Supervised Moment Retrieval and Highlight Detection (WS-MRHD) task aims at retrieving target moments and highlights in an untrimmed video with a semantic relevant text query. One of the most challenging problems in this task is the absence of reliable temporal supervision signals. In this paper, we propose a Temporal Self-paced Proposal Learning (TSPL) method to perform a progressive temporal proposal selection mechanism. It productively improves the effectiveness of contrastive learning even when the frame-level annotations are inaccessible. Specifically, our proposed TSPL method consists of three key components: (1) The Variance-Based Instance Selection (VBIS) module leverages self-paced learning for dynamic temporal proposal selection. (2) A Highlight Broadcasting (HB) module to combine reliable time spans and assign frame-level pseudo labels. (3) A Negative Sample Learning (NSL) module to align the text query with relevant video segments. By dynamically selecting the appropriate temporal proposals for training, our TSPL method conducts more reliable cross-modal alignment thus remarkably boosting retrieval performance. The extensive experiments on two WS-MRHD public benchmarks verify our proposed TSPL method substantially outperforms current state-of-the-art methods.
Loading