Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages

ACL ARR 2024 June Submission743 Authors

13 Jun 2024 (modified: 07 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Existing zero-shot cross-lingual NER approaches require substantial prior knowledge of the target language, which is not practical in the case of very low-resource languages. In this paper, we propose a novel approach to NER using phonemic representation based on the International Phonetic Alphabet (IPA) to bridge the gap between representations of different languages. Our experiments show that our method significantly outperforms the baseline models in extremely low-resource languages, particularly demonstrating its robustness with non-latin scripts.
Paper Type: Short
Research Area: Phonology, Morphology and Word Segmentation
Research Area Keywords: phonology, grapheme-to-phoneme conversion
Contribution Types: NLP engineering experiment, Approaches to low-resource settings
Languages Studied: Sinhala, Somali, Maori, Ayacucho Quechua/ Quechua Chanka, Uyghur, Assyrian Neo-Aramaic, Kinyarwanda, Ilocano, Esperanto, Khmer, Turkmen, Amharic, Maltese, Oriya, Sanskrit, Interlingua, Guarani, Belarusian, Kurdish, Tajik, Yoruba, Marathi, Javanese, Urdu, Malay, Cebuano, Croatian, Malayalam, Telugu, Uzbek, Punjabi, Kyrgyz
Submission Number: 743
Loading