Lexicon-matched Word Injection for Chinese NER

Qingyu Wang, Guokai Sun, Jianguo Sun, Yuan Zhuang, Lu Li, Tianyi Gao

Published: 01 Jan 2022, Last Modified: 16 May 2023MLMI 2022Readers: Everyone

Abstract: Recently, Chinese named entity recognition has attracted a lot of attention. Most of the work utilizes words matching with lexicon which integrates potential word information with lattice structure or graph structure. Although existing approaches have been proved to be effective for exploiting abundant word boundary information, it is difficult to model global semantic interactions due to the inherent one-way sequential nature of the DAG structure. Meanwhile, more interfering lexicon words have been introduced leading to word boundary conflicts. To address the above issues, this paper proposes a knowledge fusion method based on lexicon matching word injection, which captures sentence context features through a pre-trained learning model and then injects lexicon knowledge into each character. With the power of Transformer and well-designed encoding, it becomes easier to obtain accurate word information using the character encoding vector of transformer encoder model. Besides, the self-attention module integrating characters with different words is fully leveraged to improve recognition accuracy. Experiments on three Chinese public datasets show that the proposed method outperformed other lexicon-based methods in performance and efficiency.

0 Replies