Progressive State Space Disaggregation for Infinite Horizon Dynamic Programming

Published: 12 Feb 2024, Last Modified: 06 Mar 2024ICAPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Markov Decision Process, Hierarchical, Aggregated Markov Decision Process, Approximate Dynamic Programming, Planning
TL;DR: We provide an algorithm that progressively disaggregate the state space of an MDP to approximate the optimal value function with guarantees and no assumptions on the MDP..
Abstract: High dimensionality of model-based Reinforcement Learning and Markov Decision Processes can be reduced using abstractions of the state and action spaces. Although hierarchical learning and state abstraction methods have been explored over the past decades, explicit methods to build useful abstractions of models are rarely provided. In this work, we provide a new state abstraction method for solving infinite horizon problems in the discounted and total settings. Our approach is to progressively disaggregate abstract regions by iteratively slicing aggregations of states relatively to a value function. The distinguishing feature of our method, in contrast to previous approximations of the Bellman operator, is the disaggregation of regions during value function iterations (or policy evaluation steps). The objective is to find a more efficient aggregation that reduces the error on each piece of the partition. We provide a proof of convergence for this algorithm without making any assumptions about the structure of the problem. We also show that this process decreases the computational complexity of the Bellman operator iteration and provides useful abstractions. We then plug this state space disaggregation process in classical Dynamic Programming algorithm namely Approximate Value Iteration, Q-Value Iteration and Policy Iteration. Finally, we conduct a numerical comparison on randomly generated MDPs as well as classical MDPs. Those experiments show that our policy-based algorithm is faster than both traditional dynamic programming approach and recent aggregative methods that use a fixed number of adaptive partitions.
Primary Keywords: Theory, Temporal Planning
Category: Long
Student: Graduate
Submission Number: 287