TopiCOT: Neural topic model aligning with pre-trained clustering and optimal transport

Published: 01 Jan 2025, Last Modified: 07 Oct 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent research on VAE-based neural topic models has focused on enhancing the encoder network by incorporating pre-trained language models (PLMs) and refining topic–word relationships within the generative process. Despite these improvements, the integration of PLMs often results in increased inference costs, and document-topic distributions can still exhibit suboptimal representation. Additionally, existing neural topic models have not addressed the topic–cluster relationships. In this study, we present TopiCOT (Neural Topic Model Aligning with Pre-trained Clustering and Optimal Transport), a novel VAE-based topic model designed to overcome these limitations. TopiCOT effectively bridges the gap between the document clustering capabilities of PLMs and the core topic model, avoiding the need for direct PLMs integration. Moreover, we model the correlation between topics and pre-trained clusters through the Optimal Transport (OT) problem, which also enhances document representation and efficiently captures topic associations. Experimental results on popular benchmark datasets demonstrate that our method effectively improves document-topic distributions while preserving a high level of topic coherence comparable to other state-of-the-art baselines. Notably, our approach boosts inference speed by about 600 times compared to UTopic, a leading VAE-based method that leverages pre-trained language models.
Loading