SDHNet: Semantic Disentanglement and Heterogeneous Feature Fusion Network for Composed Image Retrieval

Published: 2025, Last Modified: 22 Jan 2026ICIC (16) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this work, we concentrate on the Composed Image Retrieval (CIR) task, which is confronted with two primary challenges. The first challenge is how to obtain features from images and text that are consistent with the query intention. The second challenge is the issue of information redundancy during the integration of heterogeneous visual and textual features. To tackle these challenges, we develop a two-step network featuring Semantic Disentanglement and Heterogeneous Feature Fusion (SDHNet). This framework has two primary components: the Semantic Disentanglement module (SDM) and the Adaptive Fusion module (AFM). The SDM module extracts desired and conflicting semantic features from images and text, significantly enhancing both types of features and enabling the model to distinguish between these semantic modalities. The AFM module employs three sub-networks that work collaboratively, with dynamic routing to assign weights adaptively, fusing the desired and conflicting features from images and text. Comprehensive experiments demonstrate that our method achieves improved performance on two benchmark datasets.
Loading