DCTM: Dual Contrastive Topic Model for identifiable topic extraction

Rui Wang; Peng Ren; Xing Liu; Shuyu Chang; Haiping Huang

DCTM: Dual Contrastive Topic Model for identifiable topic extraction

Rui Wang, Peng Ren, Xing Liu, Shuyu Chang, Haiping Huang

Published: 01 Jan 2024, Last Modified: 17 Apr 2025Inf. Process. Manag. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The recent advanced Contrastive Neural Topic Model (CNTM) was proposed to tackle topic collapse through document-level contrastive learning. However, limited by its usage of the Logistic-Normal prior in topic space and document level contrastive learning, it is less capable of disentangling semantically similar topics. To address the limitation, we propose a novel Dual Contrastive Topic Model (DCTM) that utilizes the Dirichlet prior to capture interpretable patterns. Besides, it incorporates dual (document-level and topic-level) contrastive learning on the topic distribution matrix which helps generate discriminative topic representations and mine identifiable topics. Our proposed DCTM outperforms the state-of-the-art neural topic models in terms of topic coherence and diversity, which is verified by extensive experimentation on three publicly available text corpora. In detail, the proposed DCTM surpasses baselines on almost all the used topic coherence metrics (CP<math><msub is="true"><mrow is="true"><mi is="true">C</mi></mrow><mrow is="true"><mi is="true">P</mi></mrow></msub></math>, CA<math><msub is="true"><mrow is="true"><mi is="true">C</mi></mrow><mrow is="true"><mi is="true">A</mi></mrow></msub></math>, NPMI for 20Newsgroups, CP<math><msub is="true"><mrow is="true"><mi is="true">C</mi></mrow><mrow is="true"><mi is="true">P</mi></mrow></msub></math>, CA<math><msub is="true"><mrow is="true"><mi is="true">C</mi></mrow><mrow is="true"><mi is="true">A</mi></mrow></msub></math>, NPMI and UCI for Grolier and DBPedia), and it also obtains higher topic diversity with 1 datasets respectively. Moreover, when performing text clustering, DCTM also achieves significant improvements, with observed increases of more than 1% (20Newsgroups) and 6% (DBPedia) in accuracy.

Loading