A Multi-View Clustering Algorithm for Short Text

Published: 01 Jan 2024, Last Modified: 11 Apr 2025ICDE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The objective of the short text clustering task is to group semantically similar short texts into one class and segregate semantically different short texts. Despite the commendable performance achieved by existing topic model based short text clustering algorithms and deep clustering models, a fundamental limitation persists. Both of them are based on one view of the text, which inevitably constrains their clustering performance. Specifically, the topic model based short text clustering algorithms represent short texts as bag-of-words, while the deep clustering models represent short texts as document embeddings. To address these issues, we propose a Multi-View Clustering (MVC) model that considers both views of the text. We modeled the bag-of-words view using the Dirichlet Multinomial Mixture (DMM) model and the document embedding view using the Gaussian Mixture Model (GMM). A Bernoulli random variable is used to control these two models, enabling our proposed model to utilize the semantic information of short text embeddings while obtaining the bag-of-words information. Extensive experiments on four datasets demonstrate MVC's effectiveness. The code for MVC is available at https://github.com/jhyin12/MVC.
Loading