MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: MapEval is a benchmark evaluating foundation models' geo-spatial reasoning across textual, API-based, and visual tasks using diverse map-based queries.
Abstract:

Recent advancements in foundation models have improved autonomous tool usage and reasoning, but their capabilities in map-based reasoning remain underexplored. To address this, we introduce MapEval, a benchmark designed to assess foundation models across three distinct tasks—textual, API-based, and visual reasoning—through 700 multiple-choice questions spanning 180 cities and 54 countries, covering spatial relationships, navigation, travel planning, and real-world map interactions. Unlike prior benchmarks that focus on simple location queries, MapEval requires models to handle long-context reasoning, API interactions and visual map analysis, making it the most comprehensive evaluation framework for geospatial AI. On evaluation of 30 foundation models, including Claude-3.5-Sonnet, GPT-4o, Gemini-1.5-Pro, none surpasses 67% accuracy, with open-source models performing significantly worse and all models lagging over 20% behind human performance. These results expose critical gaps in spatial inference, as models struggle with distances, directions, route planning, and place-specific reasoning, highlighting the need for better geospatial AI to bridge the gap between foundation models and real-world navigation.

Lay Summary:

Imagine asking Siri or ChatGPT questions like: "What’s the best-rated restaurant on the left side of my driving path from my home to office?" Today’s smart AIs are great with words, but when it comes to map-based questions—about places, distances, directions, or travel—they still struggle.This research presents MapEval, which tests how well AI models can understand and reason about real-world maps, places, and navigation tasks. It poses complex, practical questions—just like the ones we ask in everyday life when using apps like Google Maps or travel assistants. If we want AI to truly help us with navigation, travel planning, or even urban logistics, they must get better at understanding spatial and map-related reasoning. MapEval helps shine a light on where current models fail and how we can improve them.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Everything Else
Keywords: MapLLM, Map Query, Geo-Spatial Question Answering, Geo-Spatial Reasoning, Location Based Service, Google Maps
Submission Number: 7152
Loading