Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information

Ziyi Zhang; Yorie Nakahira; Guannan Qu

Predictive Control and Regret Analysis of Non-Stationary MDP with Look-ahead Information

Ziyi Zhang, Yorie Nakahira, Guannan Qu

Published: 02 Jul 2025, Last Modified: 03 Dec 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Policy design in non-stationary Markov Decision Processes (MDPs) is inherently challenging due to the complexities introduced by time-varying system transition and reward, which make it difficult for learners to determine the optimal actions for maximizing cumulative future rewards. Fortunately, in many practical applications, such as energy systems, look-ahead predictions are available, including forecasts for renewable energy generation and demand. In this paper, we leverage these look-ahead predictions and propose an algorithm designed to achieve low regret in non-stationary MDPs by incorporating such predictions. Our theoretical analysis demonstrates that, under certain assumptions, the regret decreases exponentially as the look-ahead window expands. When the system prediction is subject to error, the regret does not explode even if the prediction error grows sub-exponentially as a function of the prediction horizon. We validate our approach through simulations and confirm its efficacy in non-stationary environments.

Submission Length: Regular submission (no more than 12 pages of main content)

Supplementary Material: zip

Changes Since Last Submission: Modified the definition of dynamic regret. Newly added a modified setting to include RL-CD, which detects a change of MDP and resets the policy.

Assigned Action Editor: ~Alec_Koppel1

Submission Number: 4223

Loading