Learning the Dynamic Environment of an Original Game Using Hierarchical Reinforcement Learning Methods
Abstract: This paper compares the performance of two reinforcement learning algorithms, Q-Learning and MAXQ-0, in learning to play an original game. An extension of MAXQ-0 algorithm, MAXQ-P is introduced, which enhances the variety of the tree nodes with simple, ordered and repetitive nodes. The hierarchical approach provided by MAXQ-P finds the optimal solution faster than the flat Q-Learning approach but converges more slowly. Furthermore, the performance of the MAXQ-P algorithm decreases after a certain number of episodes due to representation error in the weights of the model. To address this issue, the model is periodically tested with an exploration value of 0, and if the model successfully finds the solution, it is stored for future use. This study provides insights into the benefits and drawbacks of using hierarchical reinforcement learning algorithms for complex tasks and highlights the importance of carefully designing and training such algorithms for optimal performance.
Loading