MIST: Mutual Information Maximization for Short Text ClusteringDownload PDF

Anonymous

16 Oct 2022 (modified: 05 May 2023)ACL ARR 2022 October Blind SubmissionReaders: Everyone
Keywords: short text clustering, contrastive learning, mutual information maximization
Abstract: Short text clustering poses substantial challenges due to the limited amount of information provided by each sample. Previous efforts based on dense representations are still inadequate since texts from different clusters are not sufficiently segregated in the embedding space prior to the clustering step. Even though the state-of-the-art technique integrated contrastive learning with a soft clustering objective to address this issue, the step in which all local tokens are summarized to form a sequence representation for the whole text may include noise that obscures the key information. We propose a framework called MIST: Mutual Information Maximization for Short Text Clustering, which overcomes the information limitation by maximizing the mutual information between text samples on both sequence and token levels. We assess the performance of our proposed method on eight standard short text datasets. Experimental results show that MIST outperforms the state-of-the-art methods in terms of Accuracy or Normalized Mutual Information in most cases.
Paper Type: long
Research Area: Information Retrieval and Text Mining
0 Replies

Loading