## Revisiting Bellman Errors for Offline Model Selection

08 Oct 2022, 17:47 (modified: 09 Dec 2022, 14:31)Deep RL Workshop 2022Readers: Everyone
Keywords: reinforcement learning, offline reinforcement learning, deep reinforcement learning, off-policy evaluation, model selection, Atari
TL;DR: We demonstrate that Bellman errors can be very useful for hyperparameter tuning in offline reinforcement learning, contrary to popular belief and previous literature.
Abstract: It is well-known that the empirical Bellman errors are poor predictors of value function estimation accuracy and policy performance. This has led researchers to abandon offline model selection procedures based on Bellman errors and instead focus on directly estimating the expected return under different policies of interest. The problem with this approach is that it can be very difficult to use an offline dataset generated by one policy to estimate the expected returns of a different policy. In contrast, we argue that Bellman errors can be useful for offline model selection, and that the discouraging results in past literature has been due to estimating and utilizing them incorrectly. We propose a new algorithm, $\textit{Supervised Bellman Validation}$, that estimates the expected squared Bellman error better than the empirical Bellman errors. We demonstrate the relative merits of our method over competing methods through both theoretical results and empirical results on offline datasets from the Atari benchmark. We hope that our results will challenge current attitudes and spur future research into Bellman errors and their utility in offline model selection.
0 Replies