TPOV-Seg: Textually Enhanced Prompt Tuning of Vision-Language Models for Open-Vocabulary Remote Sensing Semantic Segmentation

Xiaokang Zhang, Chufeng Zhou, Jianzhong Huang, Lefei Zhang

Published: 01 Jan 2025, Last Modified: 07 Nov 2025IEEE Transactions on Geoscience and Remote SensingEveryoneRevisionsCC BY-SA 4.0

Abstract: The remote sensing semantic segmentation faces significant challenges in open-world scenarios due to domain gaps and the presence of unseen categories in the testing datasets. Open-vocabulary semantic segmentation (OVSS) based on vision-language models (VLMs) has emerged as a promising paradigm for remote sensing imagery interpretation, which enables adaptation to new datasets with arbitrary semantic categories. However, current OVSS approaches often struggle to achieve fine-grained pixel-level localization and classification for unseen categories when relying solely on fixed textual prompts and pretrained VLM encoders. The model’s generalization capability is further hindered by insufficiently fine-grained and adaptive textual representations. To address these limitations, we propose TPOV-Seg, a textually enhanced prompt tuning for OVSS (TPOV-Seg). Specifically, a remote sensing-specific Text TempLator (TTL) is introduced to enrich textual prompts and semantic representations for land cover categories by incorporating synonymous vocabulary combinations. To efficiently align the text encoder with remote sensing characteristics, a lightweight text-aware prompt tuning (LTP-Tuning) strategy is proposed for contextual modeling of word embedding adaptation. Furthermore, a textual-guided channel-aware aggregator (TGCA) is developed to promote interchannel feature interaction and facilitate semantic modeling, leveraging grouped cross-channel Transformers and linear Transformers under the guidance of enhanced textual features from TTL. Extensive experiments on five large-scale remote sensing segmentation datasets demonstrate that TPOV-Seg outperforms existing methods in OVSS tasks, showing strong discriminative ability for unseen categories while maintaining the robust cross-domain generalization. The source codes will be available at: https://github.com/zxk688/TPOVSeg

External IDs:doi:10.1109/tgrs.2025.3624767