A Novel Framework for Automated Explain Vision Model Using Vision-Language Models

A Novel Framework for Automated Explain Vision Model Using Vision-Language Models

ACL ARR 2025 May Submission37 Authors

06 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The development of many vision models mainly focuses on improving their performance using metrics such as accuracy, IoU, and mAP, with less attention to explainability due to the complexity of applying xAI methods to provide a meaningful explanation of trained models. Although many existing xAI methods aim to explain vision models sample-by-sample, methods explaining the general behavior of vision models, which can only be captured after running on a large dataset, are still underexplored. Furthermore, some other xAI methods are complex and require expert interpretation, limiting their use in causal vision model development despite the importance of explainability. With the application of Vision-Language Models, this paper proposes a pipeline to explain vision models for both sample and dataset levels. The proposed pipeline can be applied to discover failure cases and understand vision models without much effort, thus it can integrate vision models' development and xAI analysis to advance the development of image analysis.

Paper Type: Short

Research Area: NLP Applications

Research Area Keywords: multimodal applications,explanation faithfulness,free-text/natural language explanations

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 37

Loading