Progressive Event Alignment Network for Partial Relevant Video RetrievalDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 20 Mar 2024ICME 2023Readers: Everyone
Abstract: Currently, most existing text-based video retrieval methods are only adapted to trimmed videos. However, more complicated untrimmed videos are common in multimedia applications nowadays. In this paper, we focus on the Partially Relevant Video Retrieval (PRVR) task that retrieves untrimmed long videos with partial text descriptions. To tackle this challenging problem, we propose a novel method termed Progressive Event Alignment Network (PEAN) to align text queries with local video content progressively. Specifically, it consists of three key components: (1) A Multimodal Representation Module (MRM) that extracts text representations and hierarchical video representations. (2) An Event Searching Module (ESM) that localizes the described video content roughly. (3) An Event Aligning Module (EAM) that aligns text queries and local video content at a fine-grained level. Additionally, we also design a Gaussian-based pooling strategy in both the ESM and EAM, which thoroughly mines the semantic information in representative video frames. The extensive experiments on three PRVR benchmarks demonstrate our proposed PEAN method significantly outperforms current state-of-the-art methods.
0 Replies

Loading