Topic Analysis for Text with Side Data

Topic Analysis for Text with Side Data

ACL ARR 2025 February Submission4903 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Although latent factor models (e.g., matrix factorization) perform well in predictions, they face challenges such as cold-start, lack of transparency, and suboptimal recommendations. In this paper, we leverage text with side data to address these issues. We propose a hybrid generative probabilistic model that integrates a neural network with a latent topic model within a four-level hierarchical Bayesian framework. Here, each document is a finite mixture over topics, each topic is an infinite mixture over topic probabilities, and each topic probability is a finite mixture over side data. The neural network produces an overview distribution of the side data, which serves as the LDA prior to improve topic grouping. Our experiments on various datasets show that the model outperforms standard LDA and Dirichlet-multinomial regression (DMR) in topic grouping, model perplexity, classification, and comment generation.

Paper Type: Long

Research Area: Machine Learning for NLP

Research Area Keywords: Latent Dirichlet Allocation, Neural Network

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Theory

Languages Studied: English

Submission Number: 4903

Loading