Excluding the Impossible for Open Vocabulary Semantic Segmentation

Shiyuan Zhao, Baodi Liu, Yu Bai, Weifeng Liu, Shuai Shao

Published: 01 Jan 2025, Last Modified: 04 Jun 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Open vocabulary semantic segmentation is a hot topic in research, focusing on segmenting and recognizing a diverse array of categories in varied environments, including those previously unknown, thereby holding significant practical value. Mainstream studies utilize the CLIP model for direct semantic segmentation (denoted as “forward methods”), which often struggles to represent underrepresented categories effectively. To address this issue, this paper introduces a novel approach Excluding the ImpossibLe Semantic Segmentation Network (ELSE-Net) based on reverse thinking. By excluding improbable categories, ELSE-Net narrows the selection range for forward methods, significantly reducing the risk of misclassification. In implementation, we initially draw on leading research to design the General Processing Block (GP-Block), which generates inclusion probabilities (the likelihood of belonging to a category) by using the CLIP model cooperated with a Mask Proposal Network (MPN). We then present the EXcluding the ImPossible Block (EXP-Block), which computes exclusion probabilities (the likelihood of not belonging to a category) through the CLIPN model and a custom-designed Reverse Retrieval Adapter (R2-Adapter). These exclusion probabilities are subsequently used to refine the inclusion probabilities, which are ultimately employed to annotate class-agnostic masks. Moreover, the core component of our EXP-Block is model-agnostic, enabling it to enhance the capabilities of existing frameworks. Experimental results from four benchmark datasets validate the effectiveness of ELSE-Net and underscore the seamless model-agnostic functionality of the EXP-Block.