MDP Geometry, Normalization and Reward Balancing Solvers

Arsenii Mustafin; Alex Olshevsky; Ioannis Paschalidis; Aleksei Pakharev

MDP Geometry, Normalization and Reward Balancing Solvers

Arsenii Mustafin, Alex Olshevsky, Ioannis Paschalidis, Aleksei Pakharev

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Theory of Reinforcement Learning, Convergence Analysis, Optimization

TL;DR: Novel geometric interpretation of MDP yielding new algorithms with fast convergence.

Abstract: Despite recent progress in theoretical reinforcement learning—motivated by the success of practical algorithms—there have been few fundamentally new ideas for solving Markov Decision Processes (MDPs), and state value estimation remains central to most existing approaches. In this paper, we present a new geometric interpretation of classic MDPs, introducing a natural normalization procedure that adjusts the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation motivates a class of algorithms we call Reward Balancing, which solve MDPs by iterating through such transformations until an approximately optimal policy can be trivially identified. We provide a convergence analysis of several algorithms in this class, and in particular show that for MDPs with unknown transition probabilities, our approach improves upon state-of-the-art sample complexity results.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Arsenii_Mustafin1

Track: Fast Track: published work

Publication Link: https://proceedings.mlr.press/v258/mustafin25a.html

Submission Number: 71

Loading