VAPO-ValueCoT: ValueCoT-Enhanced Search-Based Prompt Optimization for Human Value Alignment

VAPO-ValueCoT: ValueCoT-Enhanced Search-Based Prompt Optimization for Human Value Alignment

ACL ARR 2025 February Submission8335 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Ensuring that large language models (LLMs) align with human values is critical for their safe and ethical deployment. While recent work has advanced search-based prompt optimization for LLMs, there lack explicit mechanisms to address human value alignment across diverse languages and cultural contexts. In this work, we propose ValueCoT, a novel prompting strategy designed to guide search-based prompt optimization toward human value alignment. ValueCoT identifies critical factors leading to misalignment and provides positive guidance to address them. Grounded in the principle “Correct faults if found; guard against them if none”, ValueCoT simulates human reasoning to optimize system prompt to obtain more aligned responses. We integrate ValueCoT into existing search-based prompt optimization framework. The combined framework VAPO-ValueCoT is easily applicable to both open-source and closed-source LLMs, maintaining the flexibility of the base framework while enhancing its ability to address human value alignment. Experiments on both English and Chinese datasets, covering multiple choice and free-form question-answering tasks, demonstrate that VAPO-ValueCoT improves human value alignment compared to baseline methods, offering a scalable and flexible solution for multilingual and multicultural settings.

Paper Type: Long

Research Area: Human-Centered NLP

Research Area Keywords: value-centered design, values and culture

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English, Chinese

Submission Number: 8335

Loading