Self-Enhancing Programming-Driven Reasoning for Visual Question Answering

Self-Enhancing Programming-Driven Reasoning for Visual Question Answering

ACL ARR 2024 December Submission2014 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: In Visual Question Answering (VQA) tasks, program-driven reasoning methods have advanced by transforming solutions into executable code. However, existing approaches often struggle due to their reliance on a single code generation iteration, which lacks the adaptability to handle unforeseen errors. To address this challenge, we introduce the Self-Enhancing Programming-driven Reasoning framework for VQA (Seper). Seper employs large language models (LLMs) to decompose questions into multistep instructions and dynamically generates Python code using a code generator. It also incorporates a code evaluator that performs both forward and backward evaluations, initiating an iterative code regeneration process for continuous optimization. Additionally, we introduce prompt tuning to enhance the quality of the generated code. Our experiments on the GQA and OK-VQA datasets show that Seper outperforms existing methods, demonstrating its potential to advance VQA programming approaches. Code: https://anonymous.4open.science/r/Seper-5540/

Paper Type: Long

Research Area: Question Answering

Research Area Keywords: code generation and understanding; multimodal applications

Contribution Types: NLP engineering experiment, Reproduction study, Data analysis

Languages Studied: English

Submission Number: 2014

Loading