Joint Searching and Grounding: Multi-Granularity Video Content RetrievalDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 06 Nov 2023ACM Multimedia 2023Readers: Everyone
Abstract: Text-based video retrieval is a well-studied task aimed at retrieving relevant videos from a large collection in response to a given text query. Most existing TVR works assume that videos are already trimmed and fully relevant to the query thus ignoring that most videos in real-world scenarios are untrimmed and contain massive irrelevant video content. Moreover, as users' queries are only relevant to video events rather than complete videos, it is also more practical to provide specific video events rather than an untrimmed video list. In this paper, we introduce a challenging but more realistic task called Multi-Granularity Video Content Retrieval (MGVCR), which involves retrieving both video files and specific video content with their temporal locations. This task presents significant challenges since it requires identifying and ranking the partial relevance between long videos and text queries under the lack of temporal alignment supervision between the query and relevant moments. To this end, we propose a novel unified framework, termed, Joint Searching and Grounding (JSG). It consists of two branches: (1) a glance branch that coarsely aligns the query and moment proposals using inter-video contrastive learning, and (2) a gaze branch that finely aligns two modalities using both inter- and intra-video contrastive learning. Based on the glance-to-gaze design, our JSG method learns two separate joint embedding spaces for moments and text queries using a hybrid synergistic contrastive learning strategy. Extensive experiments on three public benchmarks, i.e., Charades-STA, DiDeMo, and ActivityNet-Captions demonstrate the superior performance of our JSG method on both video-level retrieval and event-level retrieval subtasks. Our open-source implementation code is available at https://github.com/CFM-MSG/Code_JSG.
0 Replies

Loading