Learning Dirichlet Processes from Partially Observed Groups
Abstract: Motivated by the task of vernacular news analysis
using known news topics from national news-papers, we study
the task of topic analysis, where given source datasets with
observed topics, data items from a target dataset need to be
assigned to observed source topics or to new ones. Using Hi
erarchical Dirichlet Processes for addressing this task imposes
unnecessary and often inappropriate generative assumptions on
the observed source topics. In this paper, we explore Dirichlet
Processes with partially observed groups (POG-DP). POG-DP
avoids modeling the given source topics. Instead, it directly
models the conditional distribution of the target data as a
mixture of a Dirichlet Process and the posterior distribution
of a Hierarchical Dirichlet Process with known groups and
topics. This introduces coupling between selection probabilities
of all topics within a source, leading to effective identification of
source topics. We further improve on this with a Combinatorial
Dirichlet Process with partially observed groups (POG-CDP)
that captures finer grained coupling between related topics
by choosing intersections between sources. We propose novel
inference algorithms for these models using collapsed Gibbs
sampling. We evaluate our models in three different real-world
applications. Using extensive experimentation, we compare
against several baselines to show that our model performs
significantly better in all three applications.
Loading