Improved Named Entity Recognition in Turkish News via Word Lookup Methods

Selim Firat Yilmaz, Ismail Balaban, Suleyman Serdar Kozat

Published: 2020, Last Modified: 02 Apr 2025SIU 2020EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Named Entity Recognition (NER) is a natural language processing task for extracting and classifying important entities such as person, location, and organization. In this paper, we propose a model including methods to lookup word vectors in word embeddings for the Turkish language. We study the proposed methods comprehensively through extensive experiments and out-of-vocabulary analysis. The proposed model only uses word embeddings and character representation via Conditional Random Fields followed by Bidirectional Long Short Memory to tag the sequence of words with the named entity tags. The proposed model achieves 94.05% F1 score on the most commonly used news dataset for the Turkish NER, which is the state of the art in the literature.