Adversarial Bottleneck Method for Vision-Language Large Model Explainability

06 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Bottleneck Principle, Explainability, Cross-Modal Explainability
Abstract: Nowadays CLIP is a leading vision-language model, showing strong functionality, especially in tasks like search engine matching. However, its high performance is often accompanied by the complexity of the decision-making process, making the interpretability of the model a major challenge. Existing XAI methods mainly focus on unimodal settings, with state-of-the-art methods often being attribution algorithms based on adversarial attacks. These methods perform well in unimodal tasks such as image classification. However, expanding these methods to handle cross-modal tasks (such as image-text alignment and cross-modal retrieval) presents several obstacles. For multimodal tasks, the most effective XAI methods currently rely on the bottleneck principle, which limits information flow to analyze model decisions. In this paper, we propose a new approach that integrates adversarial attribution methods with the bottleneck principle. This approach not only interprets multimodal models such as CLIP but also preserves the advantage of unimodal attribution algorithms in precisely identifying key features that influence model decisions within a specific modality. By introducing our model, we can obtain a more robust and broadly applicable representation for vision-language models, further enhancing their transparency and trustworthiness in complex tasks. Comprehensive experiments demonstrate that, compared to state-of-the-art XAI methods, our approach improves the interpretability of text and images by 69.12\% and 19.36\%, respectively. Our code is available at https://anonymous.4open.science/r/ABM-5C28/
Supplementary Material: pdf
Primary Area: interpretability and explainable AI
Submission Number: 2659
Loading