Abstract: Adapting cultural values in Large Language Models (LLMs) presents significant challenges, particularly due to biases and data limitations. Previous work aligns LLMs with different cultures using survey data, primarily from the World Values Survey (WVS). However, it remains unclear whether this approach effectively captures cultural nuances or produces distinct cultural representations for tasks like offensiveness classification. In this paper, we systematically investigate WVS-based training for cultural value adaptation and find that relying solely on survey data can homogenize cultural norms and interfere with factual knowledge. To address these issues, we propose augmenting WVS with encyclopedic and scenario-based cultural narratives from Wikipedia and NormAd. Our experiments across multiple cultures show that this approach captures more enhances differentiated cultural values and improves downstream classification performances.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: language/cultural bias analysis
Contribution Types: NLP engineering experiment, Data analysis
Languages Studied: arabic, bengali, chinese, english, deutsch, greek, korean, portuguese, spanish, turkish
Submission Number: 4446
Loading