Fine-Grained Prompt-Driven Stylization with Context-Aware Reasoning for Zero-Shot Domain Adaptation

16 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Zero-Shot, Domain Adaptation
Abstract: Zero-shot domain adaptive semantic segmentation (ZSDA) aims to generalize models to unseen target domains without accessing target data during training. Recent methods commonly use vision-language models (VLMs) to simulate target-domain features by guiding stylization with textual prompts. However, these approaches often suffer from two key issues: description mismatch, where generic prompts fail to reflect scene-specific semantics, and prompt-induced discrepancy, where normalization guided by coarse prompts cannot capture spatial variations. Together, these problems lead to a noticeable simulated-vs-real feature gap, reducing adaptation effectiveness. To address this, we propose FineDA, a framework designed to reduce this gap through image-specific prompt reasoning and fine-grained feature stylization. FineDA introduces a scene graph-guided chain- of-thought module that generates contextual, semantically rich target descriptions for each source image. It also incor- porates a prompt-guided local and global stylization module, enabling patch-wise class-specific adaptation while maintaining scene-level consistency. Extensive experiments on standard ZSDA benchmarks and a challenging in-house surgical dataset with adverse visual conditions such as smoke, blood, and low lighting demonstrate the effectiveness and generalization capability of our approach. Code will be released upon publication.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 6551
Loading