VAPO-ValueCoT: ValueCoT-Enhanced Search-Based Prompt Optimization for Human Value Alignment

ACL ARR 2025 February Submission8335 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Ensuring that large language models (LLMs) align with human values is critical for their safe and ethical deployment. While recent work has advanced search-based prompt optimization for LLMs, there lack explicit mechanisms to address human value alignment across diverse languages and cultural contexts. In this work, we propose ValueCoT, a novel prompting strategy designed to guide search-based prompt optimization toward human value alignment. ValueCoT identifies critical factors leading to misalignment and provides positive guidance to address them. Grounded in the principle “Correct faults if found; guard against them if none”, ValueCoT simulates human reasoning to optimize system prompt to obtain more aligned responses. We integrate ValueCoT into existing search-based prompt optimization framework. The combined framework VAPO-ValueCoT is easily applicable to both open-source and closed-source LLMs, maintaining the flexibility of the base framework while enhancing its ability to address human value alignment. Experiments on both English and Chinese datasets, covering multiple choice and free-form question-answering tasks, demonstrate that VAPO-ValueCoT improves human value alignment compared to baseline methods, offering a scalable and flexible solution for multilingual and multicultural settings.
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: value-centered design, values and culture
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English, Chinese
Submission Number: 8335
Loading