Reason-and-Execute Prompting: Enhancing MultiModal Large Language Models for Solving Geometry Questions

Xiuliang Duan; Dating Tan; Liangda Fang; Yuyu Zhou; Chaobo He; Ziliang Chen; Lusheng Wu; Guanliang Chen; Zhiguo Gong; Weiqi Luo; Quanlong Guan

Reason-and-Execute Prompting: Enhancing MultiModal Large Language Models for Solving Geometry Questions

Xiuliang Duan, Dating Tan, Liangda Fang, Yuyu Zhou, Chaobo He, Ziliang Chen, Lusheng Wu, Guanliang Chen, Zhiguo Gong, Weiqi Luo, Quanlong Guan

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: MultiModal Large Language Models (MM-LLMs) have demonstrated exceptional reasoning abilities in various visual question-answering tasks. However, they encounter significant challenges when answering geometry questions. These challenges arise due to the need to engage in rigorous reasoning and executing precise arithmetic. To enhance the ability of LLMs to solve multimodal geometric questions, we propose Reason-and-Execute (RaE) prompting: a new prompting method specifically designed for enhancing MM-LLMs to solve geometric questions. Specifically, we first designed a rigorous reasoning process based on domain knowledge of geometry, using a reverse thinking approach, and obtained the precise arithmetic steps required for solving the question. Secondly, based on the analysis of the reasoning process, we designed code blocks in a programming language to implement the arithmetic functions. Finally, by executing the contents of the code blocks using an interpreter, we obtained the answers to the geometric questions. We evaluated the accuracy of 9 models in answering questions on 6 datasets (including four geometry datasets and two science datasets) using different prompting templates. Specifically, in the main experimental result, our RaE showed a maximum enhancement of 12.8% compared to other prompting methods, which proves strong reasoning and arithmetic abilities in solving geometric questions of our method. Moreover, we analyzed the impact of answering from the perspective of solving geometric problems by considering multiple factors, including domain knowledge, geometry shapes, understanding of the question text, and language. This once again emphasizes that our method has passed the comprehensive test of solving geometry questions. The source code and data will be published in a GitHub repository.

Primary Subject Area: [Generation] Multimedia Foundation Models

Secondary Subject Area: [Content] Vision and Language

Relevance To Conference: Firstly, we propose Reason-and-Execute (RaE) prompting: the first prompting method specifically designed for enhancing MM-LLMs to solve geometric questions. Secondly, we have designed a new prompt template that combines rigorous reasoning with precise arithmetic. Finally, we analyzed geometric problems from different perspectives and tested RaE prompting method, ultimately achieving impressive results.

Supplementary Material: zip

Submission Number: 4244

Loading