Keywords: Benchmark, Vision-language Models, Multimodal Mathematical Reasoning, Remote Sensing
TL;DR: A novel benchmark designed to evaluate the mathematical reasoning capabilities of VLMs in the context of remote sensing imagery.
Abstract: Vision-language models (VLMs) have demonstrated impressive performance in various Earth observation tasks, particularly in zero-shot capabilities. However, their mathematical reasoning skills in remote sensing (RS) remain unexplored due to the lack of relevant data. To close this gap, we introduce \dataset, a multimodal mathematical reasoning benchmark meticulously designed for the RS domain. It comprises 3773 high-quality vehicle-related questions from aerial perspectives, spanning 6 mathematical subjects and 20 topics. All data used in this benchmark were collected by our drones from various altitudes and perspectives. Despite the limited geographical coverage, full access to all parameters of the RS images and detailed vehicle information ensures that the constructed mathematical problems are rigorous and diverse. With GeoMath, we have conducted a comprehensive and quantitative evaluation of 14 prominent VLMs. Solving these math problems requires high-resolution visual perception and domain-specific mathematical knowledge, which poses a challenge even for state-of-the-art VLMs. We further explore the impact of image resolution and the zero-shot prompting strategy on the scores, analyzing the reasons behind GPT-4o's reasoning errors. By comparing the gap between InternVL2 and GPT-4o, we find that the latter exhibits some level of cross-view knowledge transfer capability.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 991
Loading