Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes
Keywords: Stochastic Control, MCTS, MDP
Abstract: Markov Decision Processes (MDPs), as a general-purpose framework, often overlook the benefits of incorporating the causal structure of the transition and reward dynamics. For a subclass of resource allocation problems, we introduce the \textit{Structurally Decomposed} MDP (\texttt{SD-MDP}), which leverages causal disentanglement to partition an MDP’s temporal causal graph into independent components. By exploiting this disentanglement, \texttt{SD-MDP} enables dimensionality reduction and computational efficiency gains in optimal value function estimation. We reduce the sequential optimization problem to a fractional knapsack problem with log-linear complexity $\mathcal{O}(T \log T)$, outperforming traditional stochastic programming methods that exhibit polynomial complexity with respect to the time horizon $T$. Additionally, \texttt{SD-MDP} computational advantages are independent of state-action space size, making it viable for high-dimensional spaces. Furthermore, our approach integrates seamlessly with Monte Carlo Tree Search (MCTS), achieving higher expected rewards under constrained simulation budgets while providing a vanishing simple regret bound. Empirical results demonstrate superior policy performance over benchmarks across various logistics and finance domains.
Submission Number: 3
Loading