OVerSeeC – Open-Vocabulary CostMap Generation from Satellite Images and Natural Language

Rwik Rana; Jesse Quattrociocchi; Dongmyeong Lee; Christian Ellis; Amanda Adkins; Adam Uccello; Garrett Warnell; Joydeep Biswas

OVerSeeC – Open-Vocabulary CostMap Generation from Satellite Images and Natural Language

Rwik Rana, Jesse Quattrociocchi, Dongmyeong Lee, Christian Ellis, Amanda Adkins, Adam Uccello, Garrett Warnell, Joydeep Biswas

Published: 30 May 2025, Last Modified: 09 Jun 2025RSS 2025 Workshop ROAR OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Costmaps from Satellite Images, Natural Language aligned costmap generation, Semantic Segmentation, Robot navigation, Program synthesis

Abstract: Autonomous ground vehicles deployed in off-road settings need to reason about multiple mission-specific criteria (terrains, buildings, hazards, etc.) when planning long-range global routes. Advanced aerial imagery, captured from drones or satellite views, has the potential to provide rich prior information for such global planning. A key challenge in leveraging such aerial imagery, however, is in accommodating operator preferences without prior knowledge of the factors, terrains, or other entities that may be needed during deployment. To address these challenges, we propose OVerSeeC, a neuro-symbolic, modular, open-vocabulary, zero-shot costmap generation pipeline that produces costmaps directly from advanced aerial imagery --- guided by natural-language user preferences --- without requiring any domain-specific prior training. Our approach leverages: (1) a natural language-grounded semantic segmentation module (CLIPSeg) to produce coarse masks from text prompts; (2) a mask refiner (SAM with a lightweight rectifier) to complete and sharpen these semantic masks; and (3) a large language model (LLM) to interpret semantic entities from user preferences and generate a Python function that fuses semantic masks into a scalar, preference-aligned costmap. We empirically show that OVerSeeC (1) generates costmaps that align more closely with user preferences for global planning, achieving significantly lower rank-regret path integral scores than state-of-the-art baselines; (2) successfully generalizes to novel terrain classes specified in natural language preferences, even when these classes are absent from any supervised training ontology, effectively handling previously unseen object categories; (3) exhibits robust segmentation accuracy and planning performance under distribution shifts compared to supervised methods; and (4) human evaluation on unseen maps show that OVerSeeC produces trajectories that most closely match operator-drawn paths.

Submission Number: 13

Loading