AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation

Published: 21 Sept 2023, Last Modified: 06 Jan 2024NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: Open-Vocabulary Semantic Segmentation, Attributes, Decomposition and Aggregation
TL;DR: Exploring an attribute decomposition-aggregation framework to address open-vocabulary semantic segmentation
Abstract: Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time. Recent works explore vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios, i.e., low-quality textual category names. For example, this paradigm assumes that new textual categories will be accurately and completely provided, and exist in lexicons during pre-training. However, exceptions often happen when meet with ambiguity for brief or incomplete names, new words that are not present in the pre-trained lexicons, and difficult-to-describe categories for users. To address these issues, this work proposes a novel *attribute decomposition-aggregation* framework, **AttrSeg**, inspired by human cognition in understanding new concepts. Specifically, in the *decomposition* stage, we decouple class names into diverse attribute descriptions to complement semantic contexts from multiple perspectives. Two attribute construction strategies are designed: using large language models for common categories, and involving manually labelling for human-invented categories. In the *aggregation* stage, we group diverse attributes into an integrated global description, to form a discriminative classifier that distinguishes the target object from others. One hierarchical aggregation architecture is further proposed to achieve multi-level aggregation, leveraging the meticulously designed clustering module. The final result is obtained by computing the similarity between aggregated attributes and images embedding. To evaluate the effectiveness, we annotate three datasets with attribute descriptions, and conduct extensive experiments and ablation studies. The results show the superior performance of attribute decomposition-aggregation. We refer readers to the latest arXiv version at
Supplementary Material: pdf
Submission Number: 99