Sparse Word Graphs: A Scalable Algorithm for Capturing Word Correlations in Topic ModelsDownload PDFOpen Website

2007 (modified: 09 Sept 2021)ICDM Workshops 2007Readers: Everyone
Abstract: Statistical topic models such as the Latent Dirichlet Al- location (LDA) have emerged as an attractive framework to model, visualize and summarize large document collections in a completely unsupervised fashion. One of the limitations of this family of models is their assumption of exchangeabil- ity of words within documents, which results in a `bag-of- words' representation for documents as well as topics. As a consequence, precious information that exists in the form of correlations between words is lost in these models. In this work, we adapt recent advances in sparse mod- eling techniques to the problem of modeling word corre- lations within topics and present a new algorithm called Sparse Word Graphs. Our experiments on AP corpus re- veal both long-distance and short-distance word correla- tions within topics that are semantically very meaningful. In addition, the new algorithm is highly scalable to large collections as it captures only the most important correla- tions in a sparse manner.
0 Replies

Loading