FB-MOAC: A Reinforcement Learning Algorithm for Forward-Backward Markov Decision Processes

TMLR Paper4496 Authors

16 Mar 2025 (modified: 05 Jun 2025)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Reinforcement learning (RL) algorithms are effective in solving problems that can be modeled as Markov decision processes (MDPs). They primarily target forward MDPs whose dynamics evolve over time from an initial state. However, several important problems in stochastic control and network systems, among others, exhibit both a forward and a backward dynamics. As a consequence, they cannot be expressed as a standard MDP, thereby calling for a novel theory for RL in this context. Accordingly, this work introduces the concept of Forward-Backward Markov Decision Processes (FB-MDPs) for multi-objective problems, develops a novel theoretical framework to characterize their optimal solutions, and propose a general forward-backward step-wise template based on which RL algorithm can be adapted to address FB-MDP problems. It then introduces the Forward Backward Multi Objective Actor Critic (FB-MOAC) algorithm that obtain optimal policies with guaranteed convergence and a competitive rate with respect to standard approaches in RL. FB-MOAC is finally evaluated on three use cases in the context of mathematical finance, mobile resource management, and edge computing. The obtained results show that FB-MOAC outperforms the state of the art across different metrics, highlighting its ability to learn and maximize rewards.
Submission Length: Long submission (more than 12 pages of main content)
Assigned Action Editor: ~Alberto_Maria_Metelli2
Submission Number: 4496
Loading