Abstract: Recently, many studies have combined contrastive learning with clustering to address this issue and achieved excellent clustering results. However, traditional contrastive learning methods suffer from class conflict. We propose a new framework called Matrix Contrastive Learning (MCL) for text clustering to address this issue. Firstly, data augmentation techniques are utilized to generate pairs of positive and negative instances for all anchor examples. These pairs are mapped into a feature space, where the rows of the matrix represent soft labels for individual instances, and the columns represent cluster representations. We perform contrastive learning at both the instance and cluster levels using these rows and columns. To further improve the cluster allocation in unsupervised clustering tasks and alleviate the class conflict problem caused by instance-level contrastive learning in unsupervised conditions, the K-Nearest Neighbors algorithm is used to filter out negative instances. We conducted extensive experiments on eight challenging text datasets and compared MCL with six existing clustering methods. The results show that MCL significantly outperforms the competing methods. The code is available at https://github.com/2251821381/MCL.
Loading