Inference-time Alignment with Rewards in Anisotropic Besov Spaces: Superiority of Neural Networks over Linear Estimators
Keywords: Inference-time Alignment, Besov Space, Nonparametric Regression, Reinforcement Learning
Abstract: Inference-time alignment, the approach of adapting pre-trained models to rewards through reinforcement learning, has proven highly effective in enhancing the performance of language models. Despite its practical success, theoretical analysis remains underdeveloped, and in particular, only a limited number of studies address the practical setting where neural networks are employed as reward models. In this paper, we investigate the advantages of neural networks in inference-time alignment. Assuming that the true reward function lies in anisotropic Besov spaces, we derive upper bounds on the regret with respect to the number of oracle queries when using a neural network as a reward estimator. We further investigate the limitations of linear reward estimators, and show that neural networks are superior owing to their ability to adapt to the smoothness of functions. Finally, we demonstrate that, with an algorithm that iteratively and actively learns the reward model from the responses of the trained model, smaller regret can be achieved, as neural networks adapt to local structures.
Primary Area: learning theory
Submission Number: 17623
Loading