Exploring Multilingual Human Value Concepts in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?
Abstract: Prior research in representation engineering has revealed that LLMs encode concepts within their representation spaces, predominantly centered around English. In this study, we extend this philosophy to a multilingual scenario, delving into multilingual human value concepts in LLMs. Through our comprehensive exploration covering 7 types of human values, 16 languages and 3 LLM series with distinct multilinguality, we empirically substantiate the existence of multilingual human values in LLMs. Further cross-lingual analysis on these concepts discloses 3 traits arising from language resource disparities: cross-lingual inconsistency, distorted linguistic relationships, and unidirectional cross-lingual transfer between high- and low-resource languages, all in terms of human value concepts. Additionally, we prove the feasibility of cross-lingual control over value alignment capabilities of LLMs, leveraging the dominant language as a source language. Drawing from our findings on multilingual value alignment, we prudently provide suggestions on the composition of multilingual data for LLMs pre-training: including a limited number of dominant languages for cross-lingual alignment transfer while avoiding their excessive prevalence, and keeping a more balanced distribution of non-dominant languages. We aspire that our findings will contribute to enhancing the safety and utility of multilingual AI.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: Model analysis & interpretability
Languages Studied: English,Chinese,Catalan,French,Indonesian,Portuguese,Spanish,Chichewa,Swahili,Tamil,Telugu,Finnish,Hungarian,Japanese,Korean,Vietnamese
0 Replies
Loading