CGViT: Cross-image GroupViT for zero-shot semantic segmentation

Published: 01 Jan 2025, Last Modified: 13 Nov 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A Cross-image GroupViT is proposed for learning semantically consistent feature representation.•A momentum-based updating method is used to learn semantically consistent features.•Image-level and token-level supervisions are proposed for learning global and local information.•The proposed CGViT shows superior performance on zero-shot semantic segmentation.
Loading