Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Improve factored MDPs in terms of 1) factorizibility, 2) applicability to model-free algorithms, and 3) sample complexity guarantees
Abstract: Factored Markov Decision Processes (FMDPs) offer a promising framework for overcoming the curse of dimensionality in reinforcement learning (RL) by decomposing high-dimensional MDPs into smaller and independently evolving components. Despite their potential, existing studies on FMDPs face three key limitations: reliance on perfectly factorizable models, suboptimal sample complexity guarantees for model-based algorithms, and the absence of model-free algorithms. To address these challenges, we introduce approximate factorization, which extends FMDPs to handle imperfectly factored models. Moreover, we develop a model-based algorithm and a model-free algorithm (in the form of variance-reduced Q-learning), both achieving the first near-minimax sample complexity guarantees for FMDPs. A key novelty in the design of these two algorithms is the development of a graph-coloring-based optimal synchronous sampling strategy. Numerical simulations based on the wind farm storage control problem corroborate our theoretical findings.
Lay Summary: Solving large-scale Markov Decision Processes (MDPs) is computationally expensive due to the high-dimensional state-action spaces and the large amount of data required to model transition dynamics accurately. Traditional reinforcement learning methods often suffer from high sample complexity and slow convergence when applied to such settings. We introduce an approximate factorization framework that decomposes the MDP's transition kernel into independent or weakly dependent components. This decomposition enables more efficient learning by reducing the dimensionality of the problem and leveraging structural properties of the system. Our approach integrates this factorization with variance-reduced Q-learning, ensuring both computational efficiency and robust convergence. By exploiting the structure of MDPs, our method significantly reduces the sample complexity and accelerates convergence. In a wind farm storage control problem, our approach achieved a 19.3% reduction in penalty costs compared to baseline methods, using just one year of operational data. This framework is broadly applicable across domains where MDPs are used, offering a principled way to balance computational efficiency and solution quality in large-scale decision-making problems.
Primary Area: Theory->Reinforcement Learning and Planning
Keywords: reinforcement learning, sample complexity, q-learning, approximate factorization
Submission Number: 10721
Loading