Aligning to Thousands of Preferences via System Message Generalization

Seongyun Lee; Sue Hyun Park; Seungone Kim; Minjoon Seo

Aligning to Thousands of Preferences via System Message Generalization

Seongyun Lee, Sue Hyun Park, Seungone Kim, Minjoon Seo

Published: 10 Oct 2024, Last Modified: 15 Nov 2024Pluralistic-Alignment 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language model, Preference alignment, Pluralistic alignment, System message, Instruction tuning

TL;DR: Proposes to achieve individualized and scalable alignment of LLMs through verbalizing values in the system message, flexibly steering the model towards personalized responses

Abstract: Current large language model (LLM) alignment methods often assume that aligning LLMs with general public preferences is optimal, overlooking individual value diversity. A major challenge in adopting a more individualized approach to LLM alignment is its lack of scalability, as it involves re-training new models for new value or user. We propose a new paradigm where users specify their values within the system message, steering LLM behavior to align with individual intentions. However, LLMs are typically trained on a generic system message (e.g., "You are a helpful assistant"). To improve generalization to diverse system messages, we create a system message dataset with 197k value combinations across 66k user instructions. We train a 7B LLM, Janus and test it on five benchmarks, adding various unseen system messages reflecting user preferences. Janus achieves high tie+win rates against leading models, including GPT-4. Janus also outperforms LLaMA 3 8B Instruct on general helpfulness benchmarks, suggesting that training with diverse system messages enhances alignment with both individual and general preferences. Code, dataset, benchmark, and models are available at https://github.com/kaistAI/Janus.

Submission Number: 73

Loading