ESDA: Zero-shot semantic segmentation based on an embedding semantic space distribution adjustment strategy

Published: 01 Jan 2025, Last Modified: 14 May 2025Image Vis. Comput. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•The CLIP model cannot effectively perceive pixel-level regions.•Semantic space adjustment strategy can enable CLIP to effectively perceive regions.•A single text [CLS] token is insufficient to guide the segmentation task.•The vision-language embedding interactor can obtain richer semantic support.
Loading