A Closer Look at Novel Class Discovery from the Labeled SetDownload PDF

Published: 21 Oct 2022, Last Modified: 05 May 2023NeurIPS 2022 Workshop DistShift PosterReaders: Everyone
Keywords: novel class discovery, semantic similarity
Abstract: Novel class discovery (NCD) is to infer novel categories in an unlabeled set using prior knowledge of a labeled set comprising diverse but related classes. Existing research focuses on using the labeled set methodologically and little on analyzing it. In this study, we closer look at NCD from the labeled set and focus on two questions: (i) Given an unlabeled set, \textit{what labeled set best supports novel class discovery?} (ii) A fundamental premise of NCD is that the labeled set must be related to the unlabeled set, but \textit{how can we measure this relation?} For (i), we propose and substantiate the hypothesis that NCD could benefit from a labeled set with high semantic similarity to the unlabeled set. Using ImageNet's hierarchical class structure, we create a large-scale benchmark with variable semantic similarity across labeled/unlabeled datasets. In contrast, existing NCD benchmarks ignore the semantic relation. For (ii), we introduce a mathematical definition for quantifying the semantic similarity between labeled and unlabeled sets. We utilize this metric to validate our established benchmark and demonstrate it highly corresponds with NCD performance. Furthermore, without quantitative analysis, previous works commonly believe that label information is always beneficial. However, counterintuitively, our experimental results show that using labels may lead to sub-optimal outcomes in low-similarity settings.
1 Reply