Abstract: Named entity recognition plays a crucial role in many Natural Language Processing tasks because the semantic information is carried by entities. The recent efforts are trying to reduce the annotation labor because the state-of-the-art Named Entity Recognition systems are still based on supervised machine learning algorithms that require huge amounts of training data. Such training data are difficult and expensive to produce manually. In particular, Vietnamese is a resource-limited language which lacks high-quality named entity annotated corpora. This limitation leads to the low performance of Vietnamese Named Entity Recognition. Therefore, in this paper, thanks to the use of an existing unannotated English-Vietnamese bilingual corpus, we propose an approach to improve Named Entity Recognition systems of both English and Vietnamese languages. Experimental results show an improvement of both English and Vietnamese Named Entity Recognition compared to the strong baseline StanfordNER. In particular, Vietnamese Named Entity Recognition improves significantly by 18.45% in term of F1-score. As for the English side, F1-score improves from 92.44% to 95.05%. Our proposed method can also be generalized to apply to other resource-limited languages.
0 Replies
Loading