Monte Carlo Tree Search (MCTS) gained prominence as a decision-time planning algorithm, showcasing its capability to solve complex sequential decision-making problems \citep{silver2016mastering,silver2017mastering}. The core principle involves building a search tree and performing randomized simulations to assess actions, thereby guiding the selection process towards more promising choices over time. By incorporating the additional latent dynamics model into the tree search, it achieves remarkable performances even from pixels without the known environment model \citep{schrittwieser2020mastering}.

However, it often leads to sub-optimal decision-making when confronted with vast combinatorial action space. This is because the branching factor of MCTS increases as the number of available actions expands, making it challenging to efficiently explore and exploit during tree search \citep{continuous_coutoux_2011, pinto2017, sokota2021monte, elastic_xu_2022, veeriah2022grasp, adaptive_hoerger_2023}.

\input{figure/fig_teaser}

This issue becomes more pronounced in the environments where an action is composed of multiple sub-actions since its cardinality grows \textit{exponentially} with respect to the number of sub-actions, as illustrated in \Cref{fig:teaser}. Unfortunately, such a factorized action structure is prevalent in many real-world applications. For instance, in the context of recommender systems, an action consists of multiple recommendations on a single page. In healthcare, configurations of various medications and treatments constitute an action. Many classical domains also involve factored action space, e.g., arcade games where the players manipulate multiple controllers such as joysticks and buttons simultaneously.


Existing approaches to extend MCTS to environments with factored action space often leveraged domain knowledge such as transition structure \citep{balaji2020factoredrl}, hierarchies of sub-actions \citep{geisser2020trial}, and known environment model \citep{chitnis2021camps}. However, such prior information is often unavailable, and the true environment model is inaccessible in many domains (e.g., healthcare). Furthermore, it is unclear whether they can be extended to high-dimensional observations (i.e., pixels).

Our motivation stems from the fact that only some of the sub-actions determine the transition from the current state, making others irrelevant in many cases. For example, certain treatments often disable the influence of other medications for some patients. In the context of MCTS, the exploration of those sub-actions irrelevant to the transition would be redundant. It is worth noting that the significance of each sub-action may vary across different states, e.g., due to the varying physiological mechanisms among patients.

In this work, we propose an action abstraction based on the compositional structure between the state and sub-actions that improves the efficiency of MCTS under the factored action space. Our method identifies such relationships by learning a masked latent dynamics model that employs only sub-actions necessary for prediction, which we call state-conditioned action abstraction. Importantly, it does not rely on the true environment model and learns from raw observations. Furthermore, such compositional structure is learned solely with the reconstruction loss, making it also practical under sparse reward environments. During the tree traversal, our method infers the relevant sub-actions on each node \textit{on-the-fly}, guiding the subsequent action abstraction.

We augment MuZero \citep{schrittwieser2020mastering} and demonstrate the improved sample efficiency of our method on environments with expansive combinatorial action space. Detailed analysis of our method shows the effectiveness of state-conditioned action abstraction and illustrates that it successfully captures compositional relationships between the state and sub-actions.

Our contributions are summarized as follows:
\begin{itemize}
    \item We devise a simple and effective method that learns compositional structures among the state and actions from pixels without a known environment model.
    \item We propose a state-conditioned action abstraction for improving the efficiency of MCTS under the factored action space that considers only the sub-actions relevant to the transition from the current state. 
    \item We demonstrate the superior sample efficiency of our method compared to vanilla MuZero, which suffers from the vast combinatorial action space. 
\end{itemize}