CLIP-based knowledge projector for image-text matching

Published: 01 Jan 2026, Last Modified: 13 Nov 2025Inf. Process. Manag. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We put forward a knowledge projector network which regards prior knowledge in CLIP (Radford et al., 2021) as a teacher to guide slot attention generation process.•An adaptive weighted fusion module is used to incorporate global features into slot representations.•An effective similarity calculation method is proposed to compare with fine-grained image–text matching methods. The results indicate that our method outperforms CLIP and the most recent image–text alignment algorithms.
Loading