Keywords: neurosymbolic AI, visual question answering, answer-set programming, interpretability, GQA
TL;DR: A zero-shot neurosymbolic VQA framework using Answer-Set Programming for interpretable reasoning and error analysis.
Track: Neurosymbolic Methods for Trustworthy and Interpretable AI
Abstract: Visual Question Answering (VQA) is the task of answering natural language questions about images, which is a challenge for AI systems. To enhance adaptability and reduce training overhead, we address VQA in a zero-shot setting by leveraging pre-trained neural modules without additional fine-tuning. Our proposed hybrid neurosymbolic framework, whose capabilities are demonstrated on the challenging GQA dataset, integrates neural and symbolic components through logic-based reasoning via Answer-Set Programming. Specifically, our pipeline employs large language models for semantic parsing of input questions, followed by the generation of a scene graph that captures relevant visual content. Interpretable rules then operate on the symbolic representations of both the question and the scene graph to derive an answer. Our framework provides a key advantage: it enables full transparency into the reasoning process. Using an existing explanation tool, we illustrate how our method fosters trust by making decisions interpretable and facilitates error analysis when predictions are incorrect. Beyond explaining its own reasoning, our framework can also explain answers from more opaque models by integrating their answers into our system, enabling broader interpretability in VQA.
Paper Type: Long Paper
Software: https://github.com/pudumagico/nesy25
Submission Number: 34
Loading