Automatic Discovery of Named Entity Variants: Grammar-driven Approaches to Non-Alphabetical Transliterations

Chu-Ren Huang, Petr Simon, Shu-Kai Hsieh

2007 (modified: 13 Nov 2022)ACL 2007Readers: Everyone

Abstract: Identification of transliterated names is a particularly difficult task of Named Entity Recognition (NER), especially in the Chinese context. Of all possible variations of transliterated named entities, the difference between PRC and Taiwan is the most prevalent and most challenging. In this paper, we introduce a novel approach to the automatic extraction of diverging transliterations of foreign named entities by bootstrapping co-occurrence statistics from tagged and segmented Chinese corpus. Preliminary experiment yields promising results and shows its potential in NLP applications.

0 Replies