[Re] Boosting the Visual Interpretability of CLIP via Adversarial Fine-Tuning

[Re] Boosting the Visual Interpretability of CLIP via Adversarial Fine-Tuning

TMLR Paper9289 Authors

28 May 2026 (modified: 03 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper presents a reproducibility study of "Boosting the Visual Interpretability of CLIP via Adversarial Fine-Tuning" by Gong et al. (2025), published at ICLR 2025, which proposes an unsupervised adversarial fine-tuning (AFT) method with norm regularization to enhance the visual interpretability of CLIP's image encoder. We attempt to reproduce the key claims regarding improved saliency map quality, increased concept alignment, transferability to out-of-distribution datasets, and the trade-off with zero-shot accuracy. Beyond reproduction, we propose a saliency-guided regularization extension that introduces an Energy Pointing Game loss, directly supervising the spatial alignment of Simple Gradient saliency maps with target objects. We evaluate our extension across a range of saliency-loss weights and show that explicit saliency supervision improves localization metrics with only a modest reduction in adversarial robustness.

Submission Type: Regular submission (no more than 12 pages of main content)

Previous TMLR Submission Url: https://openreview.net/forum?id=ahmE7txRLx

Changes Since Last Submission: The previous submission (#9238) was desk-rejected due to a modified template and missing header. We have corrected the LaTeX source to use the official unmodified TMLR style file, which now produces the required 'Under review as submission to TMLR' header on every page. The scientific content of the paper is unchanged.

Assigned Action Editor: ~Pin-Yu_Chen1

Submission Number: 9289

Loading