Hierarchical reinforcement learning from imperfect demonstrations through reachable coverage-based subgoal filtering
Abstract: Highlights•The use of HRLfD vastly improves the performance of RL in large and complex tasks.•We greatly alleviate the inevitable problem of imperfect demonstrations in LfD.•We propose a novel measure to discriminate negative noise demonstrations in HRLfD.•Our method outperforms various SOTAs in Maze tasks and robotic arm tasks.
Loading