Abstract: An agent that has well understood the environment should be able to apply its skills for any given goals, leading to the important problem of learning the Universal Value Function Approximator (UVFA). A UVFA should predict the cumulative rewards between all the state-goal pairs accurately. However, for long-range goals, the value function is always hard to estimate and may consequently result in failed policy, presenting challenges to the learning process and the capability of neural networks. We propose a method to address this issue based on the Hindsight Experience Replay (HER) framework. Our method explicitly models the environment in a hierarchical manner, with a high-level graph based map to capture the global topology of the space and a low-level value network to derive precise local decisions. Our algorithm is sample efficient because knowledge learned in local neighborhoods are propagated to the whole space through the map. Experiments demonstrate that the agent can learn to reach the hardest goals at the early training stage under the sparse reward setting, in both tabular and MuJoCo control environments.
Code Link: https://github.com/FangchenLiu/map_planner
CMT Num: 1122
0 Replies
Loading