Learning semantic topics for domain-adapted textual knowledge transfer

Published: 01 Jan 2018, Last Modified: 11 Apr 2025ICIMCS 2018EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Traditional text classification methods make a basic assumption: the training and test data are homologous, while this naive assumption may not hold in the real world. Hence, this paper studies the problem of domain-adapted news text classification, hereby a model is trained on labeled data from one source domain and is able to be deployed on the other. To realize the cross-domain text classification, we propose a domain-adapted text classification method based on topic model LDA and TextCNN model, named TextLDACNN. Specifically, our work calculates the topic similarity between source and target domain, which is severed as an effective constraint to regularize the training process and hence improve the generalization of the source model to the target domain. Text classifier trained with unsupervised topic feature representation clearly outperforms the baseline TextCNN model. The result shows that our method achieves an approximately 4.0% improvement compared to the state-of-the-art method.
Loading