Inference-Time Alignment via Hypothesis Reweighting

ICLR 2026 Conference Submission13723 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Personalization, few-shot adaptation, test-time adaptation, efficient ensembles
TL;DR: on-the-fly alignment of LLMs by reweighting a lightweight ensemble
Abstract: Chat assistants must handle diverse and often conflicting user preferences, requiring adaptability to various user needs. We propose Hypothesis Reweighting (HyRe), a method that enables real-time personalization by reweighting ensemble members based on just 1-5 labeled examples from the target user or domain. Our key insight is that uniform ensemble averaging, while effective on the training distribution, often underperforms individual ensemble members under distribution shift. HyRe trains a single network with multiple prediction heads that capture different valid interpretations of preference data, then performs a simple Bayesian update to upweight heads that best match the target user's preferences. This requires only a single forward pass with negligible (<1\%) computational overhead, making it practical for inference-time alignment. We empirically validate HyRe in several target evaluation distributions. With as few as five preference pairs from each target distribution, adaptation via HyRe surpasses state-of-the-art reward models on RewardBench at both the 2B and 8B parameter scales, and improves reward model accuracy by 20\% across 32 diverse personalization tasks.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 13723
Loading