Macro Action Ensemble Searching Methodology for Deep Reinforcement Learning

Yu-Ming Chen; Chien Liu; Tsu-Ching Hsiao; Kuan-Yu Chang; Chun-Yee Lee

Macro Action Ensemble Searching Methodology for Deep Reinforcement Learning

Yu-Ming Chen, Chien Liu, Tsu-Ching Hsiao, Kuan-Yu Chang, Chun-Yee Lee

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Withdrawn SubmissionReaders: Everyone

Abstract: In this paper, we propose to improve the performance of deep reinforcement learn- ing (DRL) methods by searching for a feasible macro action ensemble to augment the action space of an agent. A macro action ensemble is composed of multiple macro actions, which are typically defined as sequences of primitive actions. A well-defined macro action ensemble enables a DRL agent to achieve higher performance than conventional DRL methods on a variety of tasks. However, macro actions generated by previous approaches are either not necessarily promising, or limited to specific forms. As a result, in this study, we investigate a search- ing method to learn the macro action ensemble from the environment of interest. The proposed method is inspired by the concepts of neural architecture search techniques, which are capable of developing network architectures for different tasks. These search techniques, such as NASNet or MetaQNN, have been proven to generate high-performance neural network architectures in large search spaces. In order to search in large macro action ensemble spaces, we propose to embrace Deep Q-Learning to search the macro action ensemble space for a good ensemble. Our approach iteratively discovers new ensembles of macro actions with better performance on the learning task. The proposed method is able to search finite macro action ensemble spaces directly, that the other contemporary methods have yet to achieve. Our experimental results show that the scores attained by the policy trained with the discovered macro action ensemble outperforms those without it. Moreover, the policies using our macro action ensemble are more efficient in exploration and able to converge faster. We further perform a comprehensive set of ablative analyses to validate the proposed methodology.

Original Pdf: pdf

1 Reply

Loading