A closer look at the explainability of Contrastive language-image pre-training

Yi Li, Hualiang Wang, Yiqun Duan, Jiheng Zhang, Xiaomeng Li

Published: 2025, Last Modified: 21 Jul 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We observe that CLIP exhibits opposite visualization and noisy activations.•We find that inconsistent self-attention and redundant features cause these issues.•The CLIP Surgery is proposed for reliable CAM, with architecture and feature surgery.•Our method greatly improves the explainability of CLIP with wide applicability.