Encouraging Sparsity in Neural Topic Modeling with Non-Mean-Field Inference

Jiayao Chen, Rui Wang, Jueying He, Mark Junjie Li

Published: 2023, Last Modified: 27 May 2026ECML/PKDD (4) 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Topic modeling is a popular method for discovering semantic information from textual data, with latent Dirichlet allocation (LDA) being a representative model. Recently, researchers have explored the use of variational autoencoders (VAE) to improve the performance of LDA. However, there remain two major limitations: (1) the Dirichlet prior is inadequate to extract precise semantic information in VAE-LDA models, as it introduces a trade-off between the topic quality and the sparsity of representations; (2) new variants of VAE-LDA models with auxiliary variables generally ignore the correlation between latent variables in the inference process due to the Mean-Field assumption. To address these issues, in this paper, we propose a Sparsity Reinforced and Non-Mean-Field Topic Model (SpareNTM) with a bank of auxiliary Bernoulli variables in the generative process of LDA to further model the sparsity of document representations. Thus individual documents are forced to focus on a subset of topics by a corresponding Bernoulli topic selector. Then, instead of applying the mean-field assumption for the posterior approximation, we take full advantage of VAE to realize a non-mean-field approximation, which succeeds in preserving the connection of latent variables. Experiment results on three datasets (20NewsGroup, Wikitext-103, and SearchSnippets) show that our model outperforms recent topic models in terms of both topic quality and sparsity.

External IDs:dblp:conf/pkdd/ChenWHL23