Keywords: Personalization, few-shot adaptation, test-time adaptation, efficient ensembles
TL;DR: On-the-fly alignment of LLMs by reweighting a lightweight ensemble
Abstract: Chat assistants must handle diverse and often conflicting user preferences, requiring adaptability to various user needs. We propose a lightweight framework to address the general challenge of aligning models to user intent at inference time. Our approach involves training an efficient ensemble, i.e., a single neural network with multiple prediction heads, each representing a different function consistent with the training data. Our main contribution is HyRe, a simple adaptation technique that dynamically reweights ensemble members at test time using a small set of labeled examples from the target distribution, which can be labeled in advance or actively queried from a larger unlabeled pool. The computational cost of our training procedure is comparable to fine-tuning a single model, and thus scales to large pretrained backbones. We empirically validate HyRe in several target evaluation distributions. With as few as five preference pairs from each target distribution, adaptation via HyRe surpasses state-of-the-art reward models on RewardBench at both the 2B and 8B parameter scales.
Submission Number: 34
Loading