Abstract: Chinese classical poetry, inheriting thousands of years of Chinese civilization, reflects the social ethos and cultural aspects of its times. In recent years, researchers have increasingly focused on using artificial intelligence to analyze Chinese classical poetry. Many of these studies rely on pre-trained language models. Unfortunately, Chinese classical poetry has a unique form and the direct use of a general pre-trained language model is ineffective. Its long time span, frequent changes in language meaning, and small amount of training data limit the development of pre-trained models for Chinese classical poetry. To address these challenges, we construct a dynamic pre-trained model for Chinese classical poetry, based on SikuBERT and using comparative learning and multi-task training strategy. During the training process, we search for hard negative and positive examples and use them for data augmentation. And we introduce sliding window to dynamically learn poetry information. Compared to the encoding provided by the baseline model, our model’s encoding achieves better performance in downstream tasks classification, translation and poem-poet matching.
Loading