Abstract: As multilingual large language models become more widely used, ensuring their safety and fairness across diverse linguistic contexts presents unique challenges. While existing research on machine unlearning has mainly focused on monolingual settings, typically English, multilingual environments introduce additional complexities due to cross-lingual knowledge transfer and biases embedded in both pretraining and fine-tuning data.
In this work, we address the problem of multilingual unlearning under two settings: (1) $\textit{data unlearning}$ and (2) $\textit{concept unlearning}$.
Using the TOFU and SeeGULL datasets translated into English, French, Hindi, Arabic, and Farsi, we demonstrate that unlearning targeted content in one language generally results in minimal performance degradation in others. However, unlearning in high-resource languages tends to be more stable. Moreover, partial asymmetric transfer occurs, particularly between typologically similar or high-resource languages such as English and French. Our findings suggest that, while some cross-lingual effects are observable, unlearning in a single language is not sufficient to fully remove the targeted knowledge from the model.
Paper Type: Long
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: Unlearning, Multilingual LLMs
Contribution Types: Model analysis & interpretability
Languages Studied: English, French, Hindi, Arabic, Farsi
Submission Number: 5948
Loading