Understanding Value Reasoning in LLMs: A Study of Consistency, Specificity, and Diversity

Understanding Value Reasoning in LLMs: A Study of Consistency, Specificity, and Diversity

ACL ARR 2025 February Submission8001 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: To assess the ethical risks associated with Large Language Models (LLMs), researchers have proposed various datasets to analyze the models' inclinations towards values. These datasets typically involve surveys and psychometric tests that require short-form responses from the LLMs. In this paper, we investigate the extent to which the value preferences estimated from these benchmarks align with downstream applications involving long-form generations. As the goal of alignment is to align the models with a consistent set of values and principles, so we analyze its impact for this experiment on 5 LLMs: llama3-8b, gemma2-9b, mistral-7b, qwen2-7b and olmo-7b. Our analysis reveals that while alignment can improve the consistency between value preferences estimated from benchmarks and long-form responses, the correlation remains weak, indicating a discrepancy between preferences in different applications. Furthermore, value preferences exhibited in long-form generations can vary significantly across generations obtained by temperature sampling. Finally, we explore the connection between the models' proficiency in generating specific and diverse value-laden arguments and their value preferences. Empirical results demonstrate that for highly preferred values, most models generate less specific arguments but more diverse arguments.

Paper Type: Long

Research Area: Computational Social Science and Cultural Analytics

Research Area Keywords: Value Alignment, Ethical Dilemma, Value Preferences, Consistency

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 8001

Loading