LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery

Jerome Quenum; Wen-Han Hsieh; Tsung-Han Wu; Ritwik Gupta; Trevor Darrell; David M. Chan

LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery

Jerome Quenum, Wen-Han Hsieh, Tsung-Han Wu, Ritwik Gupta, Trevor Darrell, David M. Chan

Published: 18 Sept 2025, Last Modified: 16 Jan 2026NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0

Keywords: Geospatial Artificial Intelligence, Multi-Modal Artificial Intelligence, Reasoning Segmentation with Satellite Images

TL;DR: A new multi-modal dataset and an initial benchmark model for Geo-spatial Artificial Intelligence

Abstract: Segmentation models can recognize a pre-defined set of objects in images. However, segmentation models capable of "reasoning" over complex user queries that implicitly refer to multiple objects of interest remain underexplored, especially in the geospatial domain. Recent advances in "reasoning segmentation"---generating segmentation masks from complex, implicit query text---demonstrate the potential of vision-language models (VLMs) to reason across an open domain of objects. Yet, our experiments reveal that these models struggle when applied to the unique challenges of remote-sensing imagery. To address this gap, we introduce a new dataset which consists of: GRES, a curated geospatial reasoning-segmentation dataset with 27,615 annotations across 9,205 images, and PreGRES, a collection of existing datasets to make up a large-scale multimodal pretraining corpus with over 1M question-answer pairs across 119,279 images. We propose an initial benchmark model, LISAt, a VLM for geospatial analysis that can describe complex remote-sensing scenes, answer detailed queries, and segment objects based on natural-language prompts. LISAt establishes a strong initial geospatial benchmark, outperforming prior foundation models such as RS-GPT4V by 10.04\% (BLEU-4) on visual description tasks and surpassing open-domain models on geospatial reasoning segmentation by 143.36\% (gIoU). Our model, dataset, and code are available on our project page: https://lisat-bair.github.io/LISAt/.

Croissant File: json

Dataset URL: https://lisat-bair.github.io/LISAt/

Code URL: https://lisat-bair.github.io/LISAt/

Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling

Flagged For Ethics Review: true

Submission Number: 2118

Loading