LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language GeneralizationDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Pretrained language models (PLMs) have become remarkably adept at task and language generalization. Nonetheless, they often fail dramatically when faced with unseen languages, posing a significant problem for diversity and equal access to PLM technology. In this work, we present LinguAlchemy, a regularization technique that incorporates various aspects of languages covering typological, geographical, and phylogenetic constraining the resulting representation of PLMs to better characterize the corresponding linguistics constraints. LinguAlchemy significantly improves the accuracy performance of mBERT and XLM-R on unseen languages by ~18% and ~2%, respectively compared to fully fine-tuned models and displaying a high degree of unseen language generalization. We further introduce AlchemyScale and AlchemyTune, extension of LinguAlchemy which adjusts the linguistic regularization weights automatically, alleviating the need for hyperparameter search. LinguAlchemy enables better cross-lingual generalization to unseen languages which is vital for better inclusivity and accessibility of PLMs.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: Approaches to low-resource settings
Languages Studied: Afrikaans, Amharic, Arabic, Azeri, Bengali, Catalan, Welsh, Danish, German, Greek, English, Spanish, Farsi, Finnish, French, Hebrew, Hindi, Hungarian, Armenian, Indonesian, Icelandic, Italian, Japanese, Javanese, Georgian, Khmer, Kannada, Korean, Latvian, Malayalam, Mongolian, Malay, Burmese, Norwegian, Dutch, Polish, Portuguese, Romanian, Russian, Slovenian, Albanian, Swedish, Swahili, Tamil, Telugu, Thai, Tagalog, Turkish, Urdu, Vietnamese, Chinese
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview