CGViT: Cross-image GroupViT for zero-shot semantic segmentation

Published: 2025, Last Modified: 09 Jan 2026Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A Cross-image GroupViT is proposed for learning semantically consistent feature representation.•A momentum-based updating method is used to learn semantically consistent features.•Image-level and token-level supervisions are proposed for learning global and local information.•The proposed CGViT shows superior performance on zero-shot semantic segmentation.
Loading