Inference-Time Alignment via Hypothesis Reweighting

TMLR Paper6612 Authors

23 Nov 2025 (modified: 06 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Chat assistants must handle diverse and often conflicting user preferences, requiring adaptability to various user needs. We propose Hypothesis Reweighting (HyRe), a method that enables real-time personalization by reweighting ensemble members based on just 1-5 labeled examples from the target user or domain. Our method builds on the key empirical observation that optimally weighting ensemble members substantially outperforms uniform averaging under distribution shift, providing a powerful inductive bias for personalization. HyRe trains a single network with multiple prediction heads that capture different valid interpretations of preference data, then performs a simple Bayesian update to upweight heads that best match the target user's preferences. This requires only a single forward pass with negligible (<1\%) computational overhead, making it practical for inference-time alignment. We empirically validate HyRe in several target evaluation distributions. With as few as five preference pairs from each target distribution, adaptation via HyRe surpasses state-of-the-art reward models on RewardBench at both the 2B and 8B parameter scales, and improves reward model accuracy by 20\% across 32 diverse personalization tasks.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Atsushi_Nitanda1
Submission Number: 6612
Loading