TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text ClusteringDownload PDF

2015 (modified: 04 Oct 2023)EMNLP 2015Readers: Everyone
Abstract: Dirichlet process mixture model (DPMM) has great potential for detecting the underlying structure of data. Extensive studies have applied it for text clustering in terms of topics. However, due to the unsupervised nature, the topic clusters are always less satisfactory. Considering that people often have some prior knowledge about which potential topics should exist in given data, we aim to incorporate such knowledge into the DPMM to improve text clustering. We propose a novel model TSDPMM based on a new seeded P´ olya urn scheme. Experimental results on document clustering across three datasets demonstrate our proposed TSDPMM significantly outperforms stateof-the-art DPMM model and can be applied in a lifelong learning framework.
0 Replies

Loading