MDP Geometry, Normalization and Reward Balancing Solvers

Arsenii Mustafin; Aleksei Pakharev; Alex Olshevsky; Ioannis Paschalidis

MDP Geometry, Normalization and Reward Balancing Solvers

Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Paschalidis

Published: 22 Jan 2025, Last Modified: 03 Oct 2025AISTATS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Novel geometric interpretation of MDP yielding new algorithms with fast convergence.

Abstract: We present a new geometric interpretation of Markov Decision Processes (MDPs) with a natural normalization procedure that allows us to adjust the value function at each state without altering the advantage of any action with respect to any policy. This advantage-preserving transformation of the MDP motivates a class of algorithms which we call *Reward Balancing*, which solve MDPs by iterating through these transformations, until an approximately optimal policy can be trivially found. We provide a convergence analysis of several algorithms in this class, in particular showing that for MDPs for unknown transition probabilities we can improve upon state-of-the-art sample complexity results.

Full Paper: https://proceedings.mlr.press/v258/mustafin25a.html

Submission Number: 843

Loading