Keywords: Personalized Alignments, Decoding-time Alignments, Efficient Preference Modeling, Few-shot Personalization
TL;DR: We propose the Drift algorithm: 1) efficient preference modeling by decomposing implicit preferences into simpler attributes, and 2) decoding-time alignments with the composition of attributes.
Abstract: Personalized alignments towards individual users have been a long-standing goal in large language models (LLMs). We introduce Drift, a novel framework that personalizes LLMs at decoding time with implicit user preferences. Unlike traditional Reinforcement Learning from Human Feedback (RLHF), which relies on vast annotated datasets and expensive gradient updates, Drift operates in a training-free manner by steering a frozen LLM through few-shot preference modeling. Our approach represents user preferences as a composition of interpretable and predefined attributes, and employs a zero-shot rewarding mechanism based on contrastive system prompts. Experiments on both a synthetic persona dataset Perspective and a real human-annotated dataset PRISM demonstrate that Drift achieves performance comparable to standard RLHF methods while using only 50–100 examples. Finally, our analysis proposes practical considerations with Drift for numerous edge cases encountered in real-world personalized services.
Submission Type: Long Paper (9 Pages)
Archival Option: This is a non-archival submission
Presentation Venue Preference: ICLR 2025
Submission Number: 49
Loading