Abstract: Highlights•Propose novel text-video retrieval method for one-to-many correspondence problem.•Multi-key memory and temporal aggregation generate multiple video embeddings.•Text-guided distillation learning makes each video embedding have distinct semantics.•Proposed video embedding is text-agnostic and enables practical retrieval system.•Achieve superior performances on four datasets and validate effectiveness of designs.
Loading