TPOV-Seg: Textually Enhanced Prompt Tuning of Vision-Language Models for Open-Vocabulary Remote Sensing Semantic Segmentation
Abstract: The remote sensing semantic segmentation faces significant challenges in open-world scenarios due to domain gaps and the presence of unseen categories in the testing datasets. Open-vocabulary semantic segmentation (OVSS) based on vision-language models (VLMs) has emerged as a promising paradigm for remote sensing imagery interpretation, which enables adaptation to new datasets with arbitrary semantic categories. However, current OVSS approaches often struggle to achieve fine-grained pixel-level localization and classification for unseen categories when relying solely on fixed textual prompts and pretrained VLM encoders. The model’s generalization capability is further hindered by insufficiently fine-grained and adaptive textual representations. To address these limitations, we propose TPOV-Seg, a textually enhanced prompt tuning for OVSS (TPOV-Seg). Specifically, a remote sensing-specific Text TempLator (TTL) is introduced to enrich textual prompts and semantic representations for land cover categories by incorporating synonymous vocabulary combinations. To efficiently align the text encoder with remote sensing characteristics, a lightweight text-aware prompt tuning (LTP-Tuning) strategy is proposed for contextual modeling of word embedding adaptation. Furthermore, a textual-guided channel-aware aggregator (TGCA) is developed to promote interchannel feature interaction and facilitate semantic modeling, leveraging grouped cross-channel Transformers and linear Transformers under the guidance of enhanced textual features from TTL. Extensive experiments on five large-scale remote sensing segmentation datasets demonstrate that TPOV-Seg outperforms existing methods in OVSS tasks, showing strong discriminative ability for unseen categories while maintaining the robust cross-domain generalization. The source codes will be available at: https://github.com/zxk688/TPOVSeg
External IDs:doi:10.1109/tgrs.2025.3624767
Loading