On Constant Regret for Low-Rank MDPs

Alexander Sturm; Sebastian Tschiatschek

On Constant Regret for Low-Rank MDPs

Alexander Sturm, Sebastian Tschiatschek

Published: 07 May 2025, Last Modified: 28 Jul 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Representation Learning, Regret Minimization, Problem-Dependent Analysis

Abstract: Although there exist instance-dependent regret bounds for linear Markov decision processes (MDPs) and low-rank bandits, extensions to low-rank MDPs remain unexplored. In this work, we close this gap and provide regret bounds for low-rank MDPs in an instance-dependent setting. Specifically, we introduce an algorithm, called UniSREP-UCB, which utilizes a constrained optimization objective to learn features with good spectral properties. Furthermore, we demonstrate that our algorithm enjoys constant regret if the minimal sub-optimality gap and the occupancy distribution of the optimal policy are well-defined and known. To the best of our knowledge, these are the first instance-dependent regret results for low-rank MDPs.

Latex Source Code: zip

Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission415/Authors, auai.org/UAI/2025/Conference/Submission415/Reproducibility_Reviewers

Submission Number: 415

Loading