
\input{figure/fig_recon}

\paragrapht{Improving the efficiency of MCTS under expansive action space.} There are numerous works on MCTS utilizing known environment models \citep{hostetler2014state,elastic_xu_2022,silver2017mastering,silver2016mastering,couetoux2011continuous,monte_kim_2020}. MuZero \citep{schrittwieser2020mastering} incorporates dynamics learning into MCTS, demonstrating its capability to solve complex sequential decision making tasks even when the true model is unavailable. There also exist several variants of MuZero \citep{grill2020monte,ozair2021vector}, e.g., which are further extended to handle stochastic transition \citep{sokota2021monte}. As the number of actions or potential outcomes for each action increases, the branching factor expands, posing challenges when facing vast combinatorial action space. Consequently, several studies \citep{adaptive_hoerger_2023, chitnis2021camps} have focused on reducing the branching factor in MCTS by employing state or action abstraction techniques. Similar to our work, \citet{chitnis2021camps} selects the most useful CSI relationship for a given task and constructs state abstraction with it. However, they do not modify the inside mechanism of MCTS, which limits its applications since its abstraction is fixed throughout the whole episode. Moreover, they require a known environment model to identify such relationships, which is impractical in many scenarios involving an unknown model and high-dimensional observations. In contrast, our method constructs action abstraction for each node \textit{on-the-fly}, leading to more efficient tree traversal through flexible abstraction (\Cref{fig:method_mcts}). Furthermore, the auxiliary network identifies such relationships from pixels without a known environment model, highlighting its practicality. 

\paragrapht{MCTS under factored action space.} Broadly, several studies \citep{tang2022leveraging, rebello2023leveraging, Tkachuk2023EfficientPI, Mahajan2021ReinforcementLI} demonstrated the benefits of utilizing the factorized structure in MDP. In the context of MCTS, \citet{geisser2020trial} leveraged factored action space by building subtrees for each action variable. However, their hierarchical order significantly influences the algorithm, and thus, the prior information on the relationships among the action variables is crucial. \citet{balaji2020factoredrl} proposed to learn the dynamics model with the factored graphs representing the conditional independences between state and action variables, which is assumed to be known. In contrast, our method does not rely on such domain knowledge and effectively extracts the compositional structure of the state and action variables along with the training of the latent dynamics model.