Multilingual Lottery Tickets to Pretrain Language Models

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX
Submission Type: Regular Long Paper
Submission Track: Multilinguality and Linguistic Diversity
Keywords: lottery ticket hypothesis, negative interference, zero-shot neural architecture search, multilingual pretrained language model
Abstract: The curse of multilinguality in training multilingual pretrained language models (mPLMs) refers to the negative interference between languages, especially when the capacity is limited. While increasing the capacity may appear intuitive for overcoming this curse, it negatively affects both training and inference costs. Our distinction is pursuing the competing goals of reducing negative interference, while keeping capacity per each language more or less the same. Specifically, we first scale the model to reduce interference, then search for a per-language subnetwork, or a lottery ticket, with comparable performance to the full model. According to lottery ticket hypothesis, this scale-then-find-ticket approach alleviates interfering signals as in the scaled model, but redistributes parameters to keep the parameters reduced. Finally, to avoid the cost of multiple retraining for searching multilingual tickets, we explore zero-shot neural architecture search (NAS) methods. We investigate the most appropriate zero-shot NAS method to find multilingual tickets. Our proposed multilingual tickets reduce the inference cost of models for each languages, while boosting the performances. The ticket search cost is negligible and tickets found qualitatively preserve linguistic similarity. Our code is publicly available.
Submission Number: 5627
Loading