Regret guarantees for model-based reinforcement learning with long-term average constraintsDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 30 Apr 2023UAI 2022Readers: Everyone
Abstract: We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with an ergodic Markov Decision Process. At every interaction, the agent obtains a reward and incurs $...
0 Replies

Loading