Disaster Damage Visualization by VLM-Based Interactive Image Retrieval and Cross-View Image Geo-Localization

Naoya Sogi, Takashi Shibata, Makoto Terao, Kenta Senzaki, Masahiro Tani, Royston Rodrigues

Published: 01 Jan 2024, Last Modified: 28 Aug 2025IGARSS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose a framework for quickly selecting images that show the disaster situation from many images, estimating their locations with high accuracy, and displaying them on a map. The proposed framework introduces interactive image retrieval based on the Vision and Language Model (VLM), which can retrieve images from many images that show the disaster situation according to the user’s intention. Using the correlation between language and images based on VLM and the similarity between images selected interactively enables more accurate retrieval. Next, for selected images for which the location of the affected area is unknown, the location of the image is estimated with street address-level accuracy by matching it with an overhead image covering a large area of the city and map data and then displayed on a map. We confirmed the effectiveness of the proposed method on publicly available datasets such as CrisisNLP.