Keywords: Geo-localization, Safety, Vision-Language Models
Abstract: Large Vision-Language Models (VLMs) have recently achieved remarkable progress in image-based geo-localization, yet face a critical safety vulnerability: they confidently predict locations for synthetic or manipulated images with no real-world correspondence. Such “refusal failures” threaten navigation, emergency response, and geographic information integrity. To address this gap, we introduce GeoSafety-Bench, the first benchmark specifically designed to evaluate geo-safety awareness in localization systems. It contains 5,997 images spanning authentic photos and four synthesis paradigms—3D rendering, text-to-image generation, image-to-image modification, and instruction-guided viewpoint synthesis. We define two key evaluation metrics, refusal failure and over-safety, to quantify the trade-off between utility and safety. Extensive experiments across retrieval-based methods, domain-specific models, and state-of-the-art VLMs reveal that while models achieve strong accuracy on authentic images, they almost universally fail to reject synthetic ones, particularly under instruction-guided generation. We also provide an illustrative baseline to show that safety-aware training can improve refusal robustness. GeoSafety-Bench thus provides a rigorous foundation for developing and evaluating trustworthy geo-localization models.
Supplementary Material: pdf
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 4175
Loading