Abstract: We introduce Topic Grouper as a complementary approach in the field of probabilistic topic modeling. Topic Grouper creates a disjunctive partitioning of the training vocabulary in a stepwise manner such that resulting partitions represent topics. Topic generation is based on a simple probabilistic model and agglomerative clustering, where clusters are formed as sets of words from the vocabulary. The resulting binary tree of topics may act as a containment hierarchy typically with more general topics towards the root of tree and more specific topics towards the leaves. As opposed to other topic modeling approaches, Topic Grouper avoids the need for hyper parameter optimizations. As part of an evaluation, we show that Topic Grouper has reasonable predictive power but also a reasonable complexity. It can deal well with stop words and function words. Also, it can handle topic distributions, where some topics are more frequent than others. We present examples of computed topics which appear as conclusive and coherent.
0 Replies
Loading