Dual-Path Contrastive Short Text Clustering with High-order Random Walk

Published: 2025, Last Modified: 29 Jan 2026ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, several robust contrastive text clustering methods have been proposed. While these methods have achieved significant performances, two issues remain. First, the false negative problem is still not fully resolved, and the false positive issue also arises because all in-neighborhood and out-of-neighborhood samples are simply treated as positive and negative pairs, respectively. Second, these methods treat text representation learning and clustering as independent processes, leading to a performance gap. We propose a novel robust method called Dual-Path Contrastive Short Text Clustering (DCTC) to address these two issues. DCTC employs instance-level contrastive learning based on random walks at the representation learning level to progressively identify data pairs in a global rather than local manner, identifying in-neighborhood negatives and out-of-neighborhood positives. At the clustering level, DCTC performs cluster-level contrastive learning, jointly optimizing representation learning and cluster assignments, thereby enhancing clustering performance. DCTC achieves state-of-the-art results on 8 datasets, demonstrating its effectiveness. The code is available at https://github.com/2251821381/DCTC.
Loading