Enhancing LDA Method by the Use of Feature Maximization

Published: 01 Jan 2024, Last Modified: 12 Jul 2025WSOM+ 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Topic modeling is a key technique for understanding the content of collections of scientific papers. However, commonly used methods like LDA (Latent Dirichlet Allocation) have significant drawbacks, including complex parameter settings. Additionally, these methods often yield low-quality results. Therefore, improving the outcomes of topic modeling is a crucial goal. In this paper, we compare the performance of LDA with a recent topic modeling approach we have developed, which relies on a combination of neural clustering and feature maximization (abbreviated as CFMf). Subsequently, we demonstrate how the feature ranking component of the CFMf method can be used to substantially enhance the performance of LDA, regardless of the number of topics. We also highlight the benefits of post-processing the clustering results before modeling topics in the CFMf approach. Our reference dataset consists of 16,917 full-text articles on the philosophy of science.
Loading