PPLSA: Parallel Probabilistic Latent Semantic Analysis Based on MapReduce

Published: 2012, Last Modified: 15 Jan 2026IIP 2012EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: PLSA(Probabilistic Latent Semantic Analysis) is a popular topic modeling technique for exploring document collections. Due to the increasing prevalence of large datasets, there is a need to improve the scalability of computation in PLSA. In this paper, we propose a parallel PLSA algorithm called PPLSA to accommodate large corpus collections in the MapReduce framework. Our solution efficiently distributes computation and is relatively simple to implement.
Loading