Learning with Real-time Improving Predictions in Online MDPs

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Online learning, Markov decision process, regret analysis, predictions
TL;DR: An online learning algorithm designed for episodic Markov Decision Processes with real-time improving predictions.
Abstract: In this paper, we introduce the Decoupling Optimistic Online Mirror Descent (DOOMD) algorithm, a novel online learning approach designed for episodic Markov Decision Processes with real-time improving predictions. Unlike conventional methods that employ a fixed policy throughout each episode, our approach allows for continuous updates of both predictions and policies within an episode. To achieve this, the DOOMD algorithm decomposes decision-making across states, enabling each state to execute an individual sub-algorithm that considers both immediate and long-term effects on future decisions. We theoretically establish a sub-linear regret bound for the algorithm, providing a guarantee on the worst-case performance.
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12074
Loading