Abstract: Chinese word boundaries cannot be directly displayed as Chinese is a sequence of characters. To attend words in sentences, inspired by span-based NER and boundary module in NER, the hidden states of current character come from its context in BiLSTM and are activated by sigmoid gate to represent boundaries. The boundaries are added into encode to get word-lever information of Chinese named entity. The values of boundaries are soft to show sentences structure obtained with labels. Experimental studies on four benchmark datasets and incorporated BERT for pre-training show our method gets the optimal recognition result in Chinese NER.
Paper Type: long
Research Area: Information Extraction
0 Replies
Loading