Benchmarking Multi-National Value Alignment for Large Language Models

ACL ARR 2024 December Submission868 Authors

15 Dec 2024 (modified: 16 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Do Large Language Models (LLMs) hold positions that conflict with your country's values? In this paper, we introduce NaVAB, a comprehensive benchmark designed to evaluate the alignment of LLMs with the values of five major nations: China, the United States, the United Kingdom, France, and Germany. Existing benchmarks, which rely on spectrum tests conducted through questionnaires, often fail to capture the dynamic nature of values across countries and lack in suffcient evaluation data. To address these limitations, NaVAB implements a value data extraction pipeline to efficiently construct value assessment datasets. This process includes a Conflict Reduction mechanism to filter non-conflicting values for a high-quality benchmark. Through extensive experiments on various LLMs (spanning Base vs. Instruct models, non-MoE vs. MoE architectures and Open vs. Closed source), we demonstrate that LLMs can be effectively aligned with the multi-national values by NaVAB.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, evaluation, NLP datasets
Contribution Types: Data resources
Languages Studied: English,Chinese,French,German
Submission Number: 868
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview