Keywords: VLM, Geolocation
Abstract: Geolocation aims to identify an image’s location and requires complex reasoning, playing an important role in navigation, monitoring, and cultural preservation. However, existing methods often yield coarse and non-interpretable predictions. A key challenge is the limited quality and scale of current geolocation datasets, which are typically small, automatically constructed, and suffer from noise and inconsistent difficulty.
To address these challenges, we introduce a comprehensive geolocation framework with three key components: Geocomp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, designed to evaluate the correctness of the geolocation reasoning process.
At the core of this framework is Geocomp, a large-scale dataset collected from a geolocation game platform involving 740K users over two years.
It comprises 25 million entries of metadata and 2.7 million geo-tagged locations spanning much of the globe, with each location annotated thousands to tens of thousands of times by human users.
Building on this dataset, we propose Geographical Chain-of-Thought (GeoCoT), a multi-step reasoning framework designed to enhance the reasoning capabilities of Large Vision Models (LVMs) in geolocation tasks.
Finally, we demonstrate that GeoCoT significantly boosts performance by up to 25\% on classic geolocation metrics and by 9\% in reasoning quality as measured by GeoEval
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Language resources, Benchmmarking
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English
Submission Number: 2754
Loading