Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning

Yeongjong Kim; Yeoneung Kim; Kwang-Sung Jun

Instance-Dependent Fixed-Budget Pure Exploration in Reinforcement Learning

Yeongjong Kim, Yeoneung Kim, Kwang-Sung Jun

Published: 12 Jun 2025, Last Modified: 04 Jul 2025EXAIT@ICML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Track: Theory

Keywords: MDP, Reinforcement Learning, Pure Exploration, Fixed Budget

TL;DR: We propose an algorithm for fixed-budget pure exploration in reinforcement learning and provide an instance-dependent theoretical guarantee..

Abstract: We study the problem of fixed-budget pure exploration in reinforcement learning. The goal is to identify a near-optimal policy, given a fixed budget on the number of interactions with the environment. Unlike the standard PAC setting, we do not require the target error level $\epsilon$ and failure rate $\dt$ as input. We propose novel algorithms and provide, to the best of our knowledge, the first instance-dependent theoretical guarantee for this setting. Our analysis yields an $\epsilon$-correctness guarantee with instance-dependent probability, characterizing the budget requirements in terms of the problem-specific hardness of exploration. As a core component of our analysis, we derive an $\epsilon$-good guarantee for the multiple bandit problem—solving multiple multi-armed bandit instances simultaneously—which may be of independent interest. To enable our analysis, we also develop tools for reward-free exploration under the fixed-budget setting, which we believe will be useful for future work in this area.

Serve As Reviewer: ~Yeongjong_Kim1

Submission Number: 55

Loading