REAR: Scalable Test-time Preference Realignment through Reward Decomposition

ICLR 2026 Conference Submission4160 Authors

11 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language model, test-time scaling, preference alignment
TL;DR: We propose a realignment reward that helps realign model responses to user preferences.
Abstract: Aligning large language models (LLMs) with diverse user preferences is a critical yet challenging task. While post-training methods can adapt models to specific needs, they often require costly data curation and additional training. Test-time scaling (TTS) presents an efficient, training-free alternative, but its application has been largely limited to verifiable domains like mathematics and coding, where response correctness is easily judged. To extend TTS to the domain of preference alignment, we introduce a novel framework that models the task as a realignment problem, as the base model often fails to sufficiently align with the preference. Our key insight is to decompose the underlying reward function into two components: one related to the question and the other to user preference. This allows us to derive a REAlignment Reward (REAR) that selectively rescales the preference-related reward while preserving the question-related reward. We show that REAR can be formulated as a linear combination of policy probabilities, making it computationally efficient and easy to integrate with existing TTS algorithms like best-of-N sampling and tree-search algorithms. Experiments on various preference alignment and role-playing benchmarks demonstrate that TTS with REAR enables scalable and effective test-time realignment with superior performance.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 4160
Loading