Controllable and Interpretable Multi-Value Alignment For Large Language Model

18 Sept 2025 (modified: 03 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Values, Value alignment, Value Evaluation, Large language model
TL;DR: We propose VCAI, a framework that builds the ML-Values dataset to enable controllable, interpretable, and multi-value alignment of large language models, improving both local response fidelity and global moral reasoning structures.
Abstract: Large Language Models (LLMs) are increasingly expected to embody human values in socially consequential contexts, but current alignment methods often lack interpretability, controllability and value diversity. We propose **V**alue-aligned **C**onstitutional **AI** (VCAI), a novel framework for fine-grained value alignment based on Schwartz’s Basic Value. Through VCAI we construct **ML-Values**, a multi-level dataset generated through role-playing, value decomposition, and iterative rewriting, allowing precise control over alignment intensity. ML-Values captures rich, context-aware expressions of values and supports multi-value alignment. Besides, by reformulating traditional value questionnaires into generative formats, we can obtain more accurate values assessment results. Experimental results demonstrate that models trained with ML-Values present enhanced controllability and generalization across moral, psychological, and cultural dimensions. Moreover, alignment influences not only local response fidelity but also global value structures of LLMs, promoting coherent moral reasoning and structured preference expression. Our work offers a robust and interpretable foundation for building trustworthy, human-centered AI systems.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 10871
Loading