On the Occupancy Measure of Non-Markovian Policies in Continuous MDPs

Romain Laroche; Remi Tachet des Combes

On the Occupancy Measure of Non-Markovian Policies in Continuous MDPs

Romain Laroche, Remi Tachet des Combes

Published: 24 Apr 2023, Last Modified: 15 Jun 2023ICML 2023 PosterEveryoneRevisions

Abstract: The state-action occupancy measure of a policy is the expected (discounted or undiscounted) number of times a state-action couple is visited in a trajectory. For decades, RL books have been reporting the occupancy equivalence between Markovian and non-Markovian policies in countable state-action spaces under mild conditions. This equivalence states that the occupancy of any non-Markovian policy can be equivalently obtained by a Markovian policy, i.e. a memoryless probability distribution, conditioned only on its current state. While expected, for technical reasons, the translation of this result to continuous state space has resisted until now. Our main contribution is to fill this gap and to provide a general measure-theoretic treatment of the problem, permitting, in particular, its extension to continuous MDPs. Furthermore, we show that when the occupancy is infinite, we may encounter some non-trivial cases where the result does not hold anymore.

Submission Number: 1762

Loading