Abstract: Latent Dirichlet Allocation (LDA) is a generative model employing the symmetry Dirichlet distribution as prior of the topic-words' distributions to implement model smoothing. When LDA is applied to text classification, smoothing is essential to classification performance. In this paper, we propose a feature-enhanced smoothing method in the idea that words not appeared in the training corpus can help to improve the classification performance. The key point is fully considering the relativity between the new document and training corpus, and enhancing the document's class feature by regarding the words not appeared in the training corpus. Evaluations on 20newsgroups show feature-enhanced smoothing can significantly improve the performance in Bi-class text classification.
Loading