Reason-and-Execute Prompting: Enhancing MultiModal Large Language Models for Solving Geometry Questions
Abstract: MultiModal Large Language Models (MM-LLMs) have demonstrated exceptional reasoning abilities in various visual question-answering tasks. However, they encounter significant challenges when answering geometry questions. These challenges arise due to the need to engage in rigorous reasoning and executing precise arithmetic. To enhance the ability of LLMs to solve multimodal geometric questions, we propose Reason-and-Execute (RaE) prompting: a new prompting method specifically designed for enhancing MM-LLMs to solve geometric questions. Specifically, we first designed a rigorous reasoning process based on domain knowledge of geometry, using a reverse thinking approach, and obtained the precise arithmetic steps required for solving the question. Secondly, based on the analysis of the reasoning process, we designed code blocks in a programming language to implement the arithmetic functions. Finally, by executing the contents of the code blocks using an interpreter, we obtained the answers to the geometric questions. We evaluated the accuracy of 9 models in answering questions on 6 datasets (including four geometry datasets and two science datasets) using different prompting templates. Specifically, in the main experimental result, our RaE showed a maximum enhancement of 12.8% compared to other prompting methods, which proves strong reasoning and arithmetic abilities in solving geometric questions of our method. Moreover, we analyzed the impact of answering from the perspective of solving geometric problems by considering multiple factors, including domain knowledge, geometry shapes, understanding of the question text, and language. This once again emphasizes that our method has passed the comprehensive test of solving geometry questions. The source code and data will be published in a GitHub repository.
Primary Subject Area: [Generation] Multimedia Foundation Models
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: Firstly, we propose Reason-and-Execute (RaE) prompting: the first prompting method specifically designed for enhancing MM-LLMs to solve geometric questions.
Secondly, we have designed a new prompt template that combines rigorous reasoning with precise arithmetic.
Finally, we analyzed geometric problems from different perspectives and tested RaE prompting method, ultimately achieving impressive results.
Supplementary Material: zip
Submission Number: 4244
Loading