Reducing the sampling complexity of topic models

Aaron Q. Li, Amr Ahmed, Sujith Ravi, Alexander J. Smola

2014 (modified: 16 Jul 2019)KDD 2014Readers: Everyone

Abstract: Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scales linearly with the number of actually instantiated topics k d in the document. For large document collections and in structured hierarchical models k d ll k. This yields an order of magnitude speedup. Our method applies to a wide variety of statistical models such as PDP [16,4] and HDP [19]. At its core is the idea that dense, slowly changing distributions can be approximated efficiently by the combination of a Metropolis-Hastings step, use of sparsity, and amortized constant time sampling via Walker's alias method.

0 Replies