Abstract: Data scarcity problem is often encountered for topic classification in many real-world applications. Zero-shot classification aims to deal with this problem by conducting a classification without any previously labelled data. However, only a few studies work on zero-shot topic classification on Chinese text. In this paper, we focus on providing an automatic tagging structure for zero-shot topic classification, which adopts labelled data for training based on a transformer-based model from external corpuses. Moreover, we show the effectiveness of fine-tuning large dataset in a downstream task, where the training data labels are not aligned with the test data labels in advance. Our experiments shows that the results outperform the performance of the benchmark approaches on two standard Chinese text datasets for the zero-shot setting.
Loading