Abstract: The potential of Open-Vocabulary Semantic Segmentation (OVSS) in few-shot scenarios is not fully explored due to the complexity of extending few-shot concepts to seman-tic segmentation tasks. To address this challenge, we propose Training-Free Mask Matching (TFM2), an efficient, mask-based adapter method that enhances OVSS models for the few-shot open vocabulary semantic segmentation task. TFM2 is a key-value cache that explicitly designed for image masks. We introduce three modules to construct and refine the mask cache, subsequently enhancing the OVSS mask classification performance. Comprehensive experiments demonstrate that TFM2 improves the performance of state-of-the-art OVSS methods by a margin of 1% to 5% across different settings. Moreover, TFM2 is not limited to any specific methods or backbones. This work underscores the importance and potential of few-shot data in OVSS and presents a significant step toward leveraging this potential
Loading