CLGP: Multi-Feature Embedding based Cross-Attention for Chinese NERDownload PDF


16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: The previous works fused lexicon information while ignoring two important Chinese language characteristics: glyph and pinyin, which carry significant syntax and semantics information for sequence tagging tasks. This paper proposes CLGP, which utilizes three specific extractors to obtain the embeddings of the glyph, pinyin, and lexicon, and further uses a network based on cross-attention to perform multi-feature embedding fusion. Specifically, we introduce the embedding scheme to preserve the lexicon matching results, and design two specific CNN architectures to extract glyph and pinyin embeddings. Moreover, we fuse the four embeddings by the cross-attention-based network to enhance the Chinese NER. The experimental results on four famous datasets show that CLGP achieves the SOTA performance.
0 Replies
