CLIP-based knowledge projector for image-text matching

Xinfeng Dong, Dingwen Zhang, Longfei Han, Huaxiang Zhang, Li Liu, Junwei Han

Published: 2026, Last Modified: 13 Nov 2025Inf. Process. Manag. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We put forward a knowledge projector network which regards prior knowledge in CLIP (Radford et al., 2021) as a teacher to guide slot attention generation process.•An adaptive weighted fusion module is used to incorporate global features into slot representations.•An effective similarity calculation method is proposed to compare with fine-grained image–text matching methods. The results indicate that our method outperforms CLIP and the most recent image–text alignment algorithms.

External IDs:dblp:journals/ipm/DongZHZLH26