ESDA: Zero-shot semantic segmentation based on an embedding semantic space distribution adjustment strategy

Jiaguang Li, Ying Wei, Wei Zhang, Chuyuan Wang

Published: 2025, Last Modified: 14 May 2025Image Vis. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•The CLIP model cannot effectively perceive pixel-level regions.•Semantic space adjustment strategy can enable CLIP to effectively perceive regions.•A single text [CLS] token is insufficient to guide the segmentation task.•The vision-language embedding interactor can obtain richer semantic support.