Trading-off Reward Maximization and Stability in Sequential Decision Making

Federico Corso; Marco Mussi; Alberto Maria Metelli

Trading-off Reward Maximization and Stability in Sequential Decision Making

Federico Corso, Marco Mussi, Alberto Maria Metelli

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Markov Decision Processes, Stability

TL;DR: In this paper we introduce a notion of stability, inspired from control theory, to the realm of Reinforcement Learning, starting from the standard setting of known, finite Markov Decision Processes.

Abstract: Reinforcement Learning (RL) focuses on learning policies that maximize the expected reward. This simple objective has enabled the success of RL in a wide range of scenarios. However, as emphasized by control-theoretic methods, stability is also a desired property when dealing with real-world systems. In this paper, we take a first step toward incorporating the notion of stability into RL. We focus on planning in *ergodic* Markov Decision Processes (MDPs), i.e., those that converge to a unique stationary distribution under any policy. We define the notion of stability in this context as the speed at which the induced Markov Chain (MC) converges to its stationary distribution. Noting that this property is connected to the spectral characteristics of the induced MC, we study the challenges of including a stability-related term in the RL objective function. First, we highlight how naive approaches to trading off between reward maximization and stability lead to bilinear optimization programs, which are computationally demanding. Second, we propose an approach that bypasses this issue through a novel formulation and a surrogate objective function.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Alberto_Maria_Metelli2

Track: Regular Track: unpublished work

Submission Number: 165

Loading