Abstract: Highlights•Propose a novel two-stage pipeline using an image tagger and a class-specific decoder.•Setting a new benchmark for Vocabulary-Free Semantic Segmentation.•Show the impact of enriched text inputs on the encoder assuming a perfect tagger.•Analyze the influence of undetected objects and false detections on the segmentation.
External IDs:dblp:journals/prl/ReichardRGHZNT25
Loading