Abstract: Highlights•We propose a novel visual explanation for ViT using a patch-level localization task.•Our method enhances explainability and localization performance across benchmarks.•Our method works with pseudo masks from self-supervised approaches.
Loading