Inference-time Alignment with Rewards in Anisotropic Besov Spaces: Superiority of Neural Networks over Linear Estimators

Inference-time Alignment with Rewards in Anisotropic Besov Spaces: Superiority of Neural Networks over Linear Estimators

ICLR 2026 Conference Submission17623 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Inference-time Alignment, Besov Space, Nonparametric Regression, Reinforcement Learning

Abstract: Inference-time alignment, the approach of adapting pre-trained models to rewards through reinforcement learning, has proven highly effective in enhancing the performance of language models. Despite its practical success, theoretical analysis remains underdeveloped, and in particular, only a limited number of studies address the practical setting where neural networks are employed as reward models. In this paper, we investigate the advantages of neural networks in inference-time alignment. Assuming that the true reward function lies in anisotropic Besov spaces, we derive upper bounds on the regret with respect to the number of oracle queries when using a neural network as a reward estimator. We further investigate the limitations of linear reward estimators, and show that neural networks are superior owing to their ability to adapt to the smoothness of functions. Finally, we demonstrate that, with an algorithm that iteratively and actively learns the reward model from the responses of the trained model, smaller regret can be achieved, as neural networks adapt to local structures.

Primary Area: learning theory

Submission Number: 17623

Loading