TL;DR: We theoretically show an exponential improvement using Deep HRL comparing to standard RL framework.
Abstract: Modern complex sequential decision-making problem often both low-level policy and high-level planning. Deep hierarchical reinforcement learning (Deep HRL) admits multi-layer abstractions which naturally model the policy in a hierarchical manner, and it is believed that deep HRL can reduce the sample complexity compared to the standard RL frameworks. We initiate the study of rigorously characterizing the complexity of Deep HRL. We present a model-based optimistic algorithm which demonstrates that the complexity of learning a near-optimal policy for deep HRL scales with the sum of number of states at each abstraction layer whereas standard RL scales with the product of number of states at each abstraction layer. Our algorithm achieves this goal by using the fact that distinct high-level states have similar low-level structures, which allows an efficient information exploitation and thus experiences from different high-level state-action pairs can be generalized to unseen state-actions. Overall, our result shows an exponential improvement using Deep HRL comparing to standard RL framework.
Keywords: hierarchical model, reinforcement learning, low regret, online learning, tabular reinforcement learning
Original Pdf: pdf
7 Replies
Loading