OW-Class: Open-world Semi-supervised Text ClassificationDownload PDF

Anonymous

16 Oct 2022 (modified: 05 May 2023)ACL ARR 2022 October Blind SubmissionReaders: Everyone
Keywords: open world, text classification, clustering
Abstract: Open-world semi-supervised classification is a problem where unlabeled samples come from both seen and unseen classes.Existing methods mainly regularize the representation space of all unlabeled samples and solely rely on clustering methods to identify the new classes.We introduce this task in the text domain and argue that class-indicative words may exist in the unlabeled samples, offering a unique opportunity that can help discover the unseen classes.To this end, we propose a novel method OW-Class that jointly performs class name prediction and document clustering, mutually enhancing each other in an iterative manner.Specifically, we first construct an overestimated number of classes through clustering.Then, we extract a list of class-indicative words from the clusters and use them to identify similar clusters and nominate class names.These refined class names further guide us to adjust the document representations, and from here, the iterative loop follows along.We conduct experiments on four popular text classification datasets by setting the most infrequent half of classes as unseen, which emphasizes the imbalanced and emerging nature of real-world scenarios.Results demonstrate the power of OW-Class in both classifying the unlabeled samples and identifying the names of unseen classes.
Paper Type: long
Research Area: Information Retrieval and Text Mining
0 Replies

Loading