DualFLAT: Dual Flat-Lattice Transformer for domain-specific Chinese named entity recognition

Published: 01 Jan 2025, Last Modified: 19 Feb 2025Inf. Process. Manag. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recently, lexicon-enhanced methods for Chinese Named Entity Recognition (NER) have achieved great success which requires a high-quality lexicon. However, for the domain-specific Chinese NER, it is challenging to obtain such a high-quality lexicon due to the different distribution between the general lexicon and domain-specific data, and the high construction cost of the domain lexicon. To address these challenges, we introduce dual-source lexicons (i.e., a general lexicon and a domain lexicon) to acquire enriched lexical knowledge. Considering that the general lexicon often contains more noise compared to its domain counterparts, we further propose a dual-stream model, Dual Flat-LAttice Transformer (DualFLAT), designed to mitigate the impact of noise originating from the general lexicon while comprehensively harnessing the knowledge contained within the dual-source lexicons. Experimental results on three public domain-specific Chinese NER datasets (i.e., News, Novel and E-commerce) demonstrate that our method consistently outperforms the single-source lexicon-enhanced approaches, achieving state-of-the-art results. Specifically, our proposed DualFLAT model consistently outperforms the baseline FLAT, with an increase of up to 1.52%, 4.84% and 1.34% in F1 score for the News, Novel and E-commerce datasets, respectively.
Loading