WebCIR: Bringing New Web Culture Concepts to Compositional Image Retrieval

Hongze Ou; Xiaoyu Liang; Lianrui Mu; Haoji Hu

WebCIR: Bringing New Web Culture Concepts to Compositional Image Retrieval

Hongze Ou, Xiaoyu Liang, Lianrui Mu, Haoji Hu

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Models, Compositional Image Retrieval, Attention Editing, Web-Scale Retrieval, Hierarchical Masking, Prompt Localization

TL;DR: This paper introduces an attention-editing framework for Compositional Image Retrieval that efficiently integrates web-scale knowledge through structured prompting and dynamic masking.

Abstract: This paper proposes an attention-editing-based network knowledge infusion method aimed at enhancing the comprehension and utilization of complex web-scale knowledge in Compositional Image Retrieval (CIR) models. Addressing the limitations of conventional multimodal models in processing massive web knowledge, this study develops an innovative attention-guided knowledge infusion framework through the construction of a structured knowledge-enhanced dataset. The proposed method achieves progressive transmission of web knowledge from coarse to fine granularity via a carefully designed prompt localization system and a hierarchically controlled masking mechanism. Specifically, structured prompt templates encode web knowledge into learnable semantic units, while dynamic attention editing governs the knowledge injection process, enabling the model to adaptively filter and integrate heterogeneous multi-source web knowledge. Experimental results demonstrate that this approach not only significantly improves the model's efficiency in capturing implicit web knowledge but also effectively mitigates knowledge conflicts and redundancy issues. Our work establishes a new technical paradigm for knowledge distillation and transfer in multimodal retrieval systems.

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 12431

Loading