TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering
Abstract: Dirichlet process mixture model (DPMM) has great potential for detecting the underlying structure of data. Extensive studies have applied it for text clustering in terms of topics. However, due to the unsupervised nature, the topic clusters are always less satisfactory. Considering that people often have some prior knowledge about which potential topics should exist in given data, we aim to incorporate such knowledge into the DPMM to improve text clustering. We propose a novel model TSDPMM based on a new seeded P´ olya urn scheme. Experimental results on document clustering across three datasets demonstrate our proposed TSDPMM significantly outperforms stateof-the-art DPMM model and can be applied in a lifelong learning framework.
0 Replies
Loading