Pseudo Meets Zero: Boosting Zero-Shot Composed Image Retrieval with Synthetic Images

ICLR 2025 Conference Submission641 Authors

14 Sept 2024 (modified: 13 Oct 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Zero-Shot Composed Image Retrieval, Synthetic Images, Multimdoal
Abstract: Composed Image Retrieval (CIR) employs a triplet architecture to combine a reference image with modified text for target image retrieval. To mitigate high annotation costs, Zero-Shot CIR (ZS-CIR) methods eliminate the need for manually annotated triplets. Current methods typically map images to tokens and concatenate them with modified text. However, they encounter challenges during inference, especially with fine-grained and multi-attribute modifications. We argue that these challenges stem from insufficient explicit modeling of triplet relationships, which complicates fine-grained interactions and directional guidance. To this end, we propose a Synthetic Image-Oriented training paradigm that automates pseudo target image generation, facilitating efficient triplet construction and accommodating inherent target ambiguity. Furthermore, we propose the Pseudo domAiN Decoupling-Alignment (PANDA) model to mitigate the Autophagy phenomenon caused by fitting targets with pseudo images. We observe that synthetic images are intermediate between visual and textual domains in triplets. Regarding this phenomenon, we design the Orthogonal Semantic Decoupling module to disentangle the pseudo domain into visual and textual components. Additionally, Shared Domain Interaction and Mutual Shift Constraint modules are proposed to collaboratively constrain the disentangled components, bridging the gap between pseudo and real triplets while enhancing their semantic consistency. Extensive experiments demonstrate that the proposed PANDA model outperforms existing state-of-the-art methods across two general scenarios and two domain-specific CIR datasets.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 641
Loading