Embedded Spherical Topic Models for Supervised Learning

Hafsa Ennajari, Nizar Bouguila, Jamal Bentahar

Published: 01 Jan 2022, Last Modified: 13 Nov 2023ICPR 2022Readers: Everyone

Abstract: Probabilistic topic models are powerful techniques for analyzing and understanding large collections of text documents to learn meaningful patterns of words. Their supervised extensions also capture topics conditioned on the response metadata associated with each document such as user rating. However, inferring such information from data often comes at the detriment of topics quality, leading to uninterpreted and meaningless topics. In this paper, we propose a novel Supervised-Embedded Spherical Topic Model (S-ESTM) that balances two goals: interpretable and coherent topics explaining the data and accurate prediction of the associated response values. Our model combines word embeddings and knowledge graph embeddings to effectively encode the semantic information of text and the related background knowledge to guide the inference of supervised topics. In S-ESTM, document constituents are drawn as points on spherical manifolds along with topics using the von Mises-Fisher distribution. Efficient variational inference methods for posterior approximation and latent parameter estimation are derived and various empirical studies on real-world datasets are also provided. Our experiments demonstrate that our model can discover discriminative and coherent topical patterns associated with regression tasks, while achieving improved prediction quality.

0 Replies