Text-guided distillation learning to diversify video embeddings for text-video retrieval

Published: 01 Jan 2024, Last Modified: 06 Nov 2024Pattern Recognit. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Propose novel text-video retrieval method for one-to-many correspondence problem.•Multi-key memory and temporal aggregation generate multiple video embeddings.•Text-guided distillation learning makes each video embedding have distinct semantics.•Proposed video embedding is text-agnostic and enables practical retrieval system.•Achieve superior performances on four datasets and validate effectiveness of designs.
Loading