Topic model tutorial: A basic introduction on latent dirichlet allocation and extensions for web scientists

Published: 01 Jan 2016, Last Modified: 26 Apr 2024WebSci 2016EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this tutorial, we teach the intuition and the assumptions behind topic models. Topic models explain the co-occurrences of words in documents by extracting sets of semantically related words, called topics. These topics are semantically coherent and can be interpreted by humans. Starting with the most popular topic model, Latent Dirichlet Allocation (LDA), we explain the fundamental concepts of probabilistic topic modeling. We organise our tutorial as follows: After a general introduction, we will enable participants to develop an intuition for the underlying concepts of probabilistic topic models. Building on this intuition, we cover the technical foundations of topic models, including graphical models and Gibbs sampling. We conclude the tutorial with an overview on the most relevant adaptions and extensions of LDA.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview