UNLEARNING GEO-CULTURAL STEREOTYPES IN MULTILINGUAL LLMS

Published: 05 Mar 2025, Last Modified: 06 Mar 2025BuildingTrustEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny Paper Track (between 2 and 4 pages)
Keywords: Machine Unlearning, Multilingual Large Language Models, Fairness, Geo-Cultural Stereotypes
Abstract:

As multilingual generative models become more widely used, most safety and fairness evaluation techniques still focus on English-language resources, while overlooking important cross-cultural factors. This limitation raises concerns about fairness and safety, particularly regarding geoculturally situated stereotypes that hinder the models’ global inclusivity. In this work, we present preliminary findings on the impact of stereotype unlearning across languages, specifically in English, French, and Hindi. Using an adapted version of the SeeGULL dataset, we analyze how unlearning stereotypes in one language influences other languages within multilingual large language models. Our study evaluates two model families, Llama-3.1-8B and Aya-Expanse-8B, to assess whether unlearning in one linguistic context transfers across languages, potentially mitigating or exacerbating biases in multilingual settings.

Submission Number: 97
Loading