Non-adaptive Online Finetuning for Offline Reinforcement Learning

Published: 28 Oct 2023, Last Modified: 04 Dec 2023GenPlan'23EveryoneRevisionsBibTeX
Abstract: Offline reinforcement learning (RL) has emerged as an important framework for applying RL to real-life applications. However, the complete lack of online interactions causes technical difficulties, and the _online finetuning_ setting incorporates a limited form of online interactions---which is often available in practice---to address these challenges. Unfortunately, current theoretical frameworks for online finetuning either assume high online sample complexity and/or require deploying fully adaptive algorithms (i.e., unlimited policy changes), which restricts their application to real-world settings where online interactions and policy updates are expensive and limited. In this paper, we develop a new framework for online finetuning. Instead of competing with the optimal policy (which inherits the high sample complexity and adaptivity requirements of online RL), we aim to learn a new policy that improves as much as possible over the existing policy using a _pre-specified_ number of online samples and with a _non-adaptive_ data-collection policy. Our formulation reveals surprising nuances and suggests novel principles that distinguishes the finetuning problem from purely online and offline RL.
Submission Number: 69