Point of pivot: cross-lingual embeddings calibration for Southern Nguni and Niger-Congo low-resourced languages.

31 Jul 2023 (modified: 07 Dec 2023)DeepLearningIndaba 2023 Conference SubmissionEveryoneRevisionsBibTeX
Keywords: Monolingual Embeddings, Cross-lingual Embeddings, Transfer learning, Intrinsic and Extrinsic evaluations, Cosine similarity, Named Entity Recognition and Part of Speech tagging
Abstract: Analytics for transfer learning on low-resourced languages are still grounded by the English-X outset even though recent research shows that English is not always the best source language. Causality can be traced back to accessibility, availability, trend, and most importantly expertise of Natural Language Processing resources. However, with some constraints such as the availability of digital data being gradually loosened in research, the need to explore other pivot alternatives becomes unequivocal. However, the point of pivot language has become a critical concern even though linguistic insights on language intelligibility is sometimes available. In this paper, we create all possible pairs of cross-lingual embeddings for the Southern Nguni and Niger-Congo languages of South Africa and investigate analyses of all combinations on word similarity and downstream tasks relative to transfer learning. Our preliminary intrinsic evaluations indicate a conflicting outcome that mutually intelligible languages do not always generate supreme entangled representations. That is, languages belonging to the same language family generate sub-standard cross-lingual representations. Intrinsic evaluation will consider available annotated downstream tasks such as Named Entity Recognition, Part of Speech Tagging, and Machine Translation with the intention of establishing point-of-pivot insights for South African Languages.
Submission Category: Machine learning algorithms
Submission Number: 76