AdAEM: An Adaptively and Automated Extensible Measurement of LLMs’ Value Orientation

ACL ARR 2025 February Submission7276 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Assessing the value orientations of Large Language Models (LLMs) is essential for comprehensively revealing their potential misalignment and risks, fostering responsible development. Nevertheless, current datasets for value measurement are often outdated or contaminated, failing to capture the underlying value differences across different models, leading to saturated and uninformative results. To address this problem, we introduce AdAEM, a novel, self-extensible assessment framework for revealing LLMs' inclinations. Distinct from previous static benchmarks, AdAEM can automatically and adaptively generate and extend its test questions. This is achieved through probing the internal value boundaries of recently developed various LLMs in an in-context optimization manner, to extract the latest or culturally provocative controversial social topics, which can more effectively elicit the underlying value differences between different LLMs, providing more distinguishable and informative value evaluation. In this way, AdAEM is able to co-evolve with the development of LLMs, consistently tracking LLMs' value dynamics. Using AdAEM, we generate 12,310 test questions grounded in Schwartz's Theory of Basic Values, benchmark value orientations of 16 popular LLMs, and conduct an extensive analysis to demonstrate our method's effectiveness, laying the groundwork for better value evaluation.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: ethical considerations in NLP applications; data ethics
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: English
Submission Number: 7276
Loading