Improving ViT interpretability with patch-level mask prediction

Junyong Kang, Byeongho Heo, Junsuk Choe

Published: 2025, Last Modified: 15 May 2025Pattern Recognit. Lett. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We propose a novel visual explanation for ViT using a patch-level localization task.•Our method enhances explainability and localization performance across benchmarks.•Our method works with pseudo masks from self-supervised approaches.