MaRVL-QA: A Benchmark for Mathematical Reasoning over Visual Landscapes

Nilay Pande; Sahiti Yerramilli; Jayant Sravan Tamarapalli; Rynaa Grover

MaRVL-QA: A Benchmark for Mathematical Reasoning over Visual Landscapes

Nilay Pande, Sahiti Yerramilli, Jayant Sravan Tamarapalli, Rynaa Grover

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multimodal Reasoning, Spatial Reasoning, Visual Reasoning, Mathematical Reasoning, Multimodal Large Language Models

Abstract: A key frontier for Multimodal Large Language Models (MLLMs) is the ability to move beyond semantic description and perform structured spatial analysis directly from images. Mathematical surface plots provide a rigorous testbed for this capability, as they isolate systematic visual reasoning from the semantic noise of natural images. To measure progress on this frontier, we introduce MaRVL-QA (Mathematical Reasoning over Visual Landscapes), a new benchmark designed to quantitatively evaluate these foundational skills. The benchmark comprises two novel tasks: Topological Counting, which requires models to identify and enumerate local extrema; and Transformation Recognition, which tests their ability to detect applied geometric transformations. Generated from a curated library of functions with rigorous ambiguity filtering, our evaluation on MaRVL-QA reveals that even state-of-the-art MLLMs struggle significantly, often resorting to superficial heuristics instead of robust strategies. We present MaRVL-QA as a challenging diagnostic tool to expose current limitations and to guide the development of MLLMs with stronger and more systematic visual-mathematical abilities.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 20396

Loading