Cross-Lingual IPA Contrastive Learning for Zero-Shot NER

ACL ARR 2025 February Submission681 Authors

10 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Existing approaches to zero-shot Named Entity Recognition (NER) for low-resource languages have primarily relied on machine translation, whereas more recent methods have shifted focus to phonemic representation. Building upon this, we investigate how reducing the phonemic representation gap in IPA transcription between languages with similar phonetic characteristics enables models trained on high-resource languages to perform effectively on low-resource languages. In this work, we propose CONtrastive Learning with IPA (CONLIPA) dataset containing 10 English and high resource languages IPA pairs from 10 frequently used language families. We also propose a cross-lingual IPA Contrastive learning method (IPAC) using the CONLIPA dataset. Furthermore, our proposed dataset and methodology demonstrate a substantial average gain when compared to the best performing baseline.
Paper Type: Long
Research Area: Phonology, Morphology and Word Segmentation
Research Area Keywords: zero-shot, named entity recognition, cross-lingual, IPA, phonology
Contribution Types: Approaches to low-resource settings, Data resources
Languages Studied: Amharic, Assyrian, Quechua, Cebuano, Croatian, English, Esperanto, Ilocano, Javanese, Khmer, Kinyarwanda, Kyrgyz, Malay, Malayalam, Maltese, Maori, Marathi, Punjabi, Sinhala, Somali, Tajik, Telugu, Turkmen, Urdu, Uyghur, Uzbek, Yoruba, Swahili, Indonesian, Hindi, Mandarin, Arabic, Vietnamese, Thai, Tamil, Turkish, Korean
Submission Number: 681
Loading