Exploration for Deployment-Efficient Reinforcement Learning Agents

Siddhant Agarwal; Max Rudolph; Omer Gottesman; Akhil Bagaria; Amy Zhang; Sohrab Andaz; Udaya Ghai; Carson Eisenach

Exploration for Deployment-Efficient Reinforcement Learning Agents

Siddhant Agarwal, Max Rudolph, Omer Gottesman, Akhil Bagaria, Amy Zhang, Sohrab Andaz, Udaya Ghai, Carson Eisenach

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Real-World Exploration, Deployment Efficient RL, Real-World RL

TL;DR: We present a novel exploration framework for offline reinforcement learning agents deployed in the real world that emphasizes the need to reduce uncertainty through a carefully constructed data collection policy.

Abstract: Reinforcement learning (RL) provides a rich toolbox with which to learn sequential decision making policies. Notably, the ability to learn solely from offline interaction data has been a highly successful modality for training real-world policies. However, a gap exists in this paradigm when the offline dataset does not cover all the behaviors necessary to extract optimal policies. Naively, one can pre-train a policy using offline RL and fine-tune it using online RL; this can lead to catastrophe in settings like healthcare and autonomous driving, where deploying an unverified policy is irresponsible. Deployment efficient learning is a potential solution, where the number of distinct data collection policies is relatively low compared to the number of updates to the policy. We argue that safely improving a dataset requires a deployment efficient algorithm with a carefully constructed data collection policy. We introduce a framework with a stationary exploration policy that aims to reduce out-of-distribution uncertainty while maintaining strong returns. We establish theoretical guarantees of this exploration framework without finetuning and demonstrate our method on a large-scale supply chain environment with real-world data.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 24176

Loading