Exploration-Free Reinforcement Learning with Linear Function Approximation

Luca Civitavecchia; Matteo Papini

Exploration-Free Reinforcement Learning with Linear Function Approximation

Luca Civitavecchia, Matteo Papini

Published: 09 May 2025, Last Modified: 15 Aug 2025RLC 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: exploration-free, linear function approximation, no-regret

Abstract: In the context of Markov Decision Processes (MDPs) with linear Bellman completeness, a generalization of linear MDPs, we reconsider the learning capabilities of a *greedy* algorithm. The motivation is that, when exploration is costly or dangerous, an exploration-free approach may be preferable to optimistic or randomized solutions. We show that, under a condition of sufficient diversity in the feature distribution, Least-Squares Value Iteration (LSVI) can achieve sublinear regret. Specifically, we show that the expected cumulative regret is at most $O(H^3\sqrt{dK/\lambda_0})$, where $K$ is the number of episodes, $H$ is the task horizon, $d$ is the dimension of the feature map and $\lambda_0$ is a measure of feature diversity. We empirically validate our theoretical findings on synthetic linear MDPs. Our analysis is a first step towards exploration-free reinforcement learning in MDPs with large state spaces.

Submission Number: 194

Loading