Abstract: Language evolution follows the rule of gradual change. Grammar, vocabulary, and lexical semantics shift took place over time, resulting in the diachronic linguistic gap. However, a considerable amount of texts are written in languages of different eras, which brings obstacles to natural language processing tasks, such as word segmentation and machine translation. Chinese is a language with a long history, but previous Chinese natural language processing works mainly focused on tasks in a specific era. Therefore, in this paper, we propose a cross-era learning framework for Chinese word segmentation (CWS), CROSSWISE, which uses the Switch-memory (SM) module to incorporate era-specific linguistic knowledge. Experiments on four corpora with different eras show that the performance of each corpus obtains a significant improvement. Further analyses also demonstrate that the SM can effectively integrate the knowledge of the eras into the neural network.
0 Replies
Loading