Keywords: Theory of Reinforcement Learning, Convergence Analysis, Optimization
TL;DR: Novel geometric interpretation of MDP yielding new algorithms with fast convergence.
Abstract: Despite recent progress in theoretical reinforcement learning—motivated by the success of practical algorithms—there have been few fundamentally new ideas for solving Markov Decision Processes (MDPs), and state value estimation remains central to most existing approaches. In this paper, we present a new geometric interpretation of classic MDPs, introducing a natural normalization procedure that adjusts the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation motivates a class of algorithms we call Reward Balancing, which solve MDPs by iterating through such transformations until an approximately optimal policy can be trivially identified. We provide a convergence analysis of several algorithms in this class, and in particular show that for MDPs with unknown transition probabilities, our approach improves upon state-of-the-art sample complexity results.
Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.
Serve As Reviewer: ~Arsenii_Mustafin1
Track: Fast Track: published work
Publication Link: https://proceedings.mlr.press/v258/mustafin25a.html
Submission Number: 71
Loading