G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

Published: 25 Sept 2024, Last Modified: 17 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY-NC 4.0
Keywords: Image Geolocalization, Image-to-GPS retrieval, Large Multi-Modal Models, CLIP
Abstract: Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the heterogeneous geographical distribution of image data. As a result, existing studies have clear limitations when scaled to a worldwide context. They may easily confuse distant images with similar visual contents, or cannot adapt to various locations worldwide with different amounts of relevant data. To resolve these limitations, we propose **G3**, a novel framework based on Retrieval-Augmented Generation (RAG). In particular, G3 consists of three steps, i.e., **G**eo-alignment, **G**eo-diversification, and **G**eo-verification to optimize both retrieval and generation phases of worldwide geolocalization. During Geo-alignment, our solution jointly learns expressive multi-modal representations for images, GPS and textual descriptions, which allows us to capture location-aware semantics for retrieving nearby images for a given query. During Geo-diversification, we leverage a prompt ensembling method that is robust to inconsistent retrieval performance for different image queries. Finally, we combine both retrieved and generated GPS candidates in Geo-verification for location prediction. Experiments on two well-established datasets IM2GPS3k and YFCC4k verify the superiority of G3 compared to other state-of-the-art methods. Our code is available online [https://github.com/Applied-Machine-Learning-Lab/G3](https://github.com/Applied-Machine-Learning-Lab/G3) for reproduction.
Primary Area: Machine learning for other sciences and fields
Submission Number: 5376
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview