Evaluating VLMs' General Ability on Next Location Prediction

10 Jan 2025 (modified: 18 Jun 2025)Submitted to ICML 2025EveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose a framework to endow VLMs the power to predict the next trajectory location to benchmark the VLMs.
Abstract: Predicting the next location is a hallmark of spatial intelligence. In real-world scenarios, humans often rely on visual estimation to perform next-location prediction, such as anticipating movement to avoid collisions with others. With the emergence of large models demonstrating general visual capabilities, we explore whether vision-language models (VLMs) can perform similar next location prediction as human. We present \textbf{VLMLocPredictor}, a benchmark for evaluating VLMs on next location prediction tasks by contributing: (1) the Visual Guided Location Search (VGLS) module, a recursive refinement strategy leveraging visual guidance to iteratively narrow the search space for predictions; (2) a comprehensive vision-based dataset integrating open-source map taxi trajectory; (3) a human benchmark established via a large-scale social experiment. Through over 1000 queries on 14 VLMs, our findings indicate that VLMs exhibit promising potential for next-location prediction through our methods. However, their performance currently does not reach human-level accuracy. While some VLMs show potential to outperform humans in 24\% scenarios, we believe in the near future, VLMs will surpass the average human performance in next-location prediction tasks. The benchmark and resources are available at \url{https://ihhh.cn}.
Primary Area: Deep Learning->Large Language Models
Keywords: Next Location Prediction, VisionLanguage Model, Machine Learning
Submission Number: 740
Loading