Distributed Keyword-guided Topic Model with Lexical Knowledge Supervision

Published: 2025, Last Modified: 08 Jan 2026ACM Trans. Knowl. Discov. Data 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Topic models are often used to discover latent semantic patterns from document collections. However, existing unsupervised approaches have the following drawbacks: (1) The mined topics may not match user interests; (2) They are prone to extract semantically similar topics and sacrifice diversity; (3) The mined topics often have low interpretability, which does not meet common sense knowledge. To address these limitations, we propose the Distributed Keyword-guided Topic Model (DiskTM) that incorporates Gaussian-distributed keyword prior knowledge into the modeling process to mine user-interested topics. Furthermore, to inject common-sense knowledge and improve the topic’s interpretability, we extend DiskTM and propose the Distributed Keyword-guided Topic Model with Lexical Knowledge (DiskTM-LK). Experimental results on three publicly available text corpora show that our proposed approaches could extract topics that match user interests (keywords). Moreover, DiskTM and DiskTM-LK could also obtain more coherent and diverse topics, outperforming the state-of-the-art baseline approaches.
Loading