D-SCOPE: Diffusion-based Sonar Counterfactual and Prototype Explanations

TMLR Paper7640 Authors

23 Feb 2026 (modified: 05 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The harsh conditions of underwater environments pose significant challenges for effective monitoring. While using cameras is possible, they are typically limited to short ranges due to underwater visibility conditions. SOund NAvigation and Ranging (SONAR) can perceive objects at greater distances, but produces low-visibility images that are hard to interpret, even for experts. When Artificial Intelligence (AI) methods are used on these SONAR images, Explainable Artificial Intelligence (XAI) methods might help the user understand the AI outputs. Traditional explainability methods, such as saliency maps or perturbation-based visualisations, often struggle to provide informative explanations when applied to low-contrast imagery. This work introduces Diffusion-based SONAR COunterfactual & Prototype Explanations (D-SCOPE), a novel post-hoc explainability framework for SONAR image classification. Our presented approach leverages classifier-guided diffusion models, trained on two publicly available Marine Debris Forward-Looking SONAR datasets, to generate two types of visual explanations: (1) counterfactual explanations that highlight minimal semantic changes to alter a model’s decision, and (2) prototype-based explanations for case-based reasoning that translate representative RGB samples into the SONAR domain, serving as intuitive visual references. For counterfactual explanations, a semi-factual explanation is generated by displaying the intermediate steps leading to a change in prediction. For the prototype-based explanation, class-specific prototypes are provided. To the best of our knowledge, this is the first approach applying diffusion-based generative models for explainability in the SONAR modality. Guided diffusion models are shown to produce high-fidelity, class-conditioned counterfactuals in challenging underwater settings. In addition, the proposed cross-domain prototype generation mechanism enhances human interpretability by bridging the gap between clear and recognisable RGB representations and SONAR imagery. Our framework is validated through qualitative and quantitative experiments as well as a controlled human evaluation. The code and the pretrained models shall be released to support further research.
Submission Type: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Anirbit_Mukherjee1
Submission Number: 7640
Loading