Abstract: Although latent factor models (e.g., matrix factorization) perform well in predictions, they face challenges such as cold-start, lack of transparency, and suboptimal recommendations. In this paper, we leverage text with side data to address these issues. We propose a hybrid generative probabilistic model that integrates a neural network with a latent topic model within a four-level hierarchical Bayesian framework. Here, each document is a finite mixture over topics, each topic is an infinite mixture over topic probabilities, and each topic probability is a finite mixture over side data. The neural network produces an overview distribution of the side data, which serves as the LDA prior to improve topic grouping. Our experiments on various datasets show that the model outperforms standard LDA and Dirichlet-multinomial regression (DMR) in topic grouping, model perplexity, classification, and comment generation.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: Latent Dirichlet Allocation, Neural Network
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Theory
Languages Studied: English
Submission Number: 4903
Loading