$\text{Slerp}^{+}$: Spherical Linear Interpolation for Unified Compositional Retrieval

ICLR 2025 Conference Submission1907 Authors

19 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: multi-modal representation learning, composed retrieval
Abstract: Zero-shot composed image/video retrieval is a challenging task that involves using a combination of a reference visual input and a relative caption as a query to search for target visual data. Earlier studies have treated composed image retrieval and composed video retrieval methods separately, potentially neglecting the benefits of integrating image-video-text representation learning. In this paper, we consolidate these tasks into a single Composed \emph{Visual} Retrieval (CVR) task, which requires the composition of image and video samples with textual modifications using a unified retrieval model. Our principal insight is that the video modality can be effectively added to existing vision-language pretrained models. When integrated with the Spherical Linear Interpolation (Slerp) method previously proposed for Composed Image Retrieval (CoIR), we found that it results in an effective approach for solving the CVR task, which we called $\text{Slerp}^{+}$. Extensive experiments demonstrate $\text{Slerp}^{+}$'s superiority across various composed image and video retrieval benchmarks, including our newly proposed video benchmark. Notably, $\text{Slerp}^{+}$ mutually enhances image and video retrieval performance over single-modality models, underscoring its potential to transform the field of compositional visual retrieval.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1907
Loading