Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human ValuesDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: Value alignment is crucial for the responsible development of Large Language Models (LLMs). However, how to define values in this context remains largely unexplored. Existing work mainly specifies values as risk criteria formulated in the AI community, e.g., fairness and privacy protection, suffering from poor clarity, adaptability and transparency. Leveraging basic values in humanity and social science, this paper introduces a novel value space spanned by multiple basic value dimensions, compatible with human values across cultures, and then proposes BaseAlign, a corresponding value alignment paradigm. Applying the representative Schwartz's Theory of Basic Values as an instantiation, we construct FULCRA, a dataset consisting of 10k $($LLM output, value vector$)$ pairs. LLMs' behaviors can be mapped into a $K$-dim value space beyond simple binary labels, by identifying their underlying priorities for these value dimensions. Extensive analysis and experiments on FULCRA: (1) reveal the intrinsic relation between basic values and LLMs' behaviors, (2) demonstrate that our paradigm not only covers existing risks but also anticipates the unidentified ones, and (3) manifest BaseAlign's superiority in alignment performance with less data, paving the way for addressing the above-mentioned three challenges.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: Data resources
Languages Studied: English
0 Replies

Loading