Keywords: reinforcement learning, kinodynamic planning, policy search
TL;DR: We employ a sample-based planning method for more directed exploration and efficiency in policy learning
Abstract: Most Deep Reinforcement Learning methods perform local search and
therefore are prone to get stuck on non-optimal
solutions. Furthermore, in simulation based training, such as
domain-randomized simulation training, the availability of a simulation
model is not exploited, which potentially decreases
efficiency. To overcome issues of local search and exploit
access to simulation models, we propose the use of kino-dynamic
planning methods as part of a model-based reinforcement learning
method and to learn in an off-policy fashion from solved planning
instances. We show that, even on a simple toy domain, D-RL
methods (DDPG, PPO, SAC) are not immune to local optima and
require additional exploration mechanisms. We show that our
planning method exhibits a better state space coverage, collects
data that allows for better policies than D-RL methods without
additional exploration mechanisms and that starting from the
planner data and performing additional training results in as
good as or better policies than vanilla D-RL methods, while also
creating data that is more fit for re-use in modified tasks.
Original Pdf: pdf
7 Replies
Loading