Interpretable Vision Tasks via Vision Logic Model Integrating Visual Reasoning and Textual Explanation
Keywords: Interpretable Vision, Explainable Artificial Intelligence, Visual Reasoning
TL;DR: We propose an interpretable vision logic model that performs vision tasks while generating video-based visual reasoning and natural language explanations for transparent AI decisions.
Abstract: Despite remarkable advances in computer vision, most state-of-the-art models remain black boxes, offering limited insight into their decision-making processes. In this paper, we propose an interpretable vision logic model that enhances the transparency and trustworthiness of vision tasks, including classification, detection, and segmentation. Our framework not only produces standard outputs (e.g., class labels, bounding boxes, segmentation maps) but also generates dynamic visual reasoning videos and natural language explanations that reveal the underlying logic behind each prediction. Particularly, for each input image, our model visualizes the step-by-step reasoning process, highlighting critical features, attention regions, and intermediate decisions through a video output. In parallel, a textual explanation module provides a rationale in human-understandable language, offering additional context and interpretability. This end-to-end approach allows users to simultaneously obtain the main vision task result, a transparent visual narrative of the decision process, and an interpretable explanation, all within a unified framework. Experiments on standard vision benchmarks demonstrate that our method delivers high task accuracy while significantly improving explainability and user trust. Our vision logic model paves the way for more interpretable and accountable AI in critical applications, such as medical imaging, autonomous driving, and industrial inspection.
Primary Area: causal reasoning
Submission Number: 8615
Loading