Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes

Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes

RLC 2025 Workshop CRLW Submission3 Authors

06 Jun 2025 (modified: 30 Jul 2025)RLC 2025 Workshop CRLW SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Stochastic Control, MCTS, MDP

Abstract: Markov Decision Processes (MDPs), as a general-purpose framework, often overlook the benefits of incorporating the causal structure of the transition and reward dynamics. For a subclass of resource allocation problems, we introduce the \textit{Structurally Decomposed} MDP (\texttt{SD-MDP}), which leverages causal disentanglement to partition an MDP’s temporal causal graph into independent components. By exploiting this disentanglement, \texttt{SD-MDP} enables dimensionality reduction and computational efficiency gains in optimal value function estimation. We reduce the sequential optimization problem to a fractional knapsack problem with log-linear complexity $\mathcal{O}(T \log T)$, outperforming traditional stochastic programming methods that exhibit polynomial complexity with respect to the time horizon $T$. Additionally, \texttt{SD-MDP} computational advantages are independent of state-action space size, making it viable for high-dimensional spaces. Furthermore, our approach integrates seamlessly with Monte Carlo Tree Search (MCTS), achieving higher expected rewards under constrained simulation budgets while providing a vanishing simple regret bound. Empirical results demonstrate superior policy performance over benchmarks across various logistics and finance domains.

Submission Number: 3

Loading