Seeing and Solving: An Interpreter-Solver Framework for Geometric Reasoning with Large Vision and Language Models

Seeing and Solving: An Interpreter-Solver Framework for Geometric Reasoning with Large Vision and Language Models

ACL ARR 2025 July Submission825 Authors

28 Jul 2025 (modified: 03 Sept 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Geometrical Problem Solving (GPS), which involves interpreting diagrams and text to solve problems using logical reasoning and mathematical principles, has gained significant attention with the advancement of Multimodal Large Language Models (MLLMs). However, solving these problems in a zero-shot setting has received comparatively little attention, despite the growing improvements in AI reasoning for visual mathematics understanding. In this study, we propose Interpreter-Solver, a two-stage pipeline that seamlessly integrates Vision Language Models (VLMs) and Large Language Models (LLMs) to address these issues. Our approach harnesses the VLM's visual understanding to extract formal textual descriptions of geometric relationships, which are then processed by the LLM for its outstanding reasoning capabilities. This entire process employs a zero-shot prompting strategy to resolve the previous challenges. Without any fine-tuning, it establishes itself as a new state-of-the-art by achieving accuracies of 83.19% on the Geometry3K dataset and 69.67% on the MathVerse dataset. It surpasses leading methods like InterGPS, GeoDRL, and AutoGPS while requiring 5x and 2.8x fewer parameters than the top models on these benchmarks. https://anonymous.4open.science/r/Interpreter-Solver/

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: math QA, mathematical NLP, LLM/AI agents, zero/few-shot extraction, multimodal QA, logical reasoning

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 825

Loading