Keywords: Interpretability, Information Bottleneck, Multi-Modal Learning
Abstract: Multimodal attribution methods such as M2IB aim to interpret vision-language models without requiring task-specific labels, but they often rely on the assumption of accurate semantic alignment between image-text pairs. This assumption does not hold in open-world settings, where noisy or mismatched inputs are common. Under such conditions, existing attribution methods tend to overfit and generate forced explanations, compromising the reliability and trustworthiness of interpretability results. To address this issue, we observe that a well-balanced trade-off between the compression and prediction terms in the information bottleneck objective can mitigate overfitting. Based on this insight, we propose an attribution framework that leverages an adaptive information bottleneck optimization objective. Our method dynamically adjusts the bottleneck constraints without assuming reliable cross-modal alignment. Extensive experiments on large-scale image-text datasets demonstrate that our approach consistently outperforms existing attribution methods in both quantitative metrics and qualitative interpretability, providing more robust and trustworthy explanations while relaxing the requirement for aligned image-text pairs.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 12327
Loading