PAIR: Complementarity-guided Disentanglement for Composed Image Retrieval

Published: 01 Jan 2025, Last Modified: 01 Aug 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Composed Image Retrieval (CIR) is a novel image retrieval paradigm that aims at searching for the target images via the multimodal query including a reference image and a modification text. Although existing works have made significant progress, they overlook the inter-modal coherence and incoherence relations modeling, hindering the retrieval accuracy of CIR models. This limitation is non-trivial due to the following two challenges: 1) inter-modal incoherence and 2) intra-modal entanglement. To address the above challenges, we propose a comPlementArity-guided dIsentanglement netwoRk (PAIR), which can disentangle the features of multimodal queries from a semantic coherence perspective, thereby facilitating the identification of both complementary coherent and incoherent features. Furthermore, based on disentangled features, PAIR develops an asymmetric feature composition module, which is designed to enhance the retrieval performance of the model. Extensive experiments on three benchmark datasets demonstrate the superiority of PAIR. The code is available at https://zhihfu.github.io/PAIR.github.io/.
Loading