Probabilistic Prototype Generation Network for Cross-Domain Few-Shot Semantic Segmentation

16 Sept 2025 (modified: 21 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: DINOv2; Probabilistic modeling
TL;DR: PPGN incorporates CNN's local feature extraction with global context prior features of DINOv2 to generate discriminative and robust prototypes under probabilistic learning, which establishes a new paradigm for future research in the CD-FSS community.
Abstract: Cross-domain few-shot semantic segmentation (CD-FSS) aims to tackle the challenge of adapting models from labeled source domains to unseen target domains with novel classes and limited annotations. Existing methods predominantly rely on straightforward support-query feature matching, making them vulnerable to domain shifts and limiting their generalization. In contrast, vision foundation models (VFMs) based on Transformer architectures demonstrate exceptional cross-domain transferability by offering powerful off-the-shelf global contextual priors. To this end, we propose a novel probabilistic prototype generation network (PPGN), which integrates global contextual priors from VFMs to enhance prototype representation learning with probabilistic modeling for CD-FSS. Specifically, PPGN adopts a dual-encoder architecture that incorporates DINOv2’s capability of global contextual modeling with conventional CNN-based local feature extraction, thus leading to more comprehensive visual representations. We first design a dynamic prototype generator (DPG), which exploits high-confidence response maps from both branches to guide the generation of discriminative query prototypes, mitigating the inherent support-query divergence. Next, we propose a mixed-probabilistic prototype generator (MPG) that performs probabilistic modeling on the hybrid prototype integrated from heterogeneous feature spaces to enhance prototype generalization. Finally, an adaptive prediction aggregator (APG) is leveraged to refine segmentation by recalibrating and integrating multi-stage predictions. Extensive experiments demonstrate that PPGN achieves state-of-the-art performance on four CD-FSS benchmarks.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7023
Loading