Simplified Concrete Dropout - Improving the Generation of Attribution Masks for Fine-grained Classification

Dimitri Korsch, Maha Shadaydeh, Joachim Denzler

Published: 2025, Last Modified: 01 Mar 2026Int. J. Comput. Vis. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In fine-grained classification, which is classifying images into subcategories within a common broader category, it is crucial to have precise visual explanations of the classification model’s decision. While commonly used attention- or gradient-based methods deliver either too coarse or too noisy explanations unsuitable for highlighting subtle visual differences reliably, perturbation-based methods can precisely locate pixels causally responsible for the predicted category. The fill-in of the dropout (FIDO) algorithm is one of those methods, which utilizes concrete dropout (CD) to sample a set of attribution masks and updates the sampling parameters based on the output of the classification model. In this paper, we present a solution against the high variance in the gradient estimates, a known problem of the FIDO algorithm that has been mitigated until now by large mini-batch updates of the sampling parameters. First, our solution allows for estimating the parameters with smaller mini-batch sizes without losing the quality of the estimates but with a reduced computational effort. Next, our method produces finer and more coherent attribution masks. Finally, we use the resulting attribution masks to improve the classification performance on three fine-grained datasets without additional fine-tuning steps and achieve results that are otherwise only achieved if ground truth bounding boxes are used.
Loading