MIST: Mutual Information Maximization for Short Text ClusteringDownload PDF

Anonymous

16 Dec 2022 (modified: 05 May 2023)ACL ARR 2022 December Blind SubmissionReaders: Everyone
Abstract: Short text clustering poses substantial challenges due to the limited amount of information provided by each text sample. Previous efforts based on dense representations are still inadequate since texts from different clusters are not sufficiently segregated in the embedding space prior to the clustering stage. Even though the state-of-the-art technique integrated contrastive learning with a soft clustering objective to address this issue, the process of summarizing all local tokens to form a sequence representation for the whole text may include noise that obscures the key information. We propose a framework called MIST: Mutual Information Maximization for Short Text Clustering, which overcomes the information limitation by maximizing the mutual information between texts on both sequence and token levels. We assess the performance of our proposed method on eight standard short text datasets. Experimental results show that MIST outperforms the state-of-the-art methods in terms of Accuracy or Normalized Mutual Information in most cases.
Paper Type: long
Research Area: Information Retrieval and Text Mining
0 Replies

Loading