Keywords: Deep Reinforcement Learning, Evolution Strategies, Redundancy Resolution, Action Bias, Velocity Control
Abstract: We propose a novel approach that biases actions during policy search by lifting the concept of redundancy resolution from multi-DoF robot kinematics to the level of the reward in deep reinforcement learning and evolution strategies. The key idea is to bias the distribution of executed actions in the sense that the immediate reward remains unchanged. The resulting biased actions favor secondary objectives yielding policies that are safer to apply on the real robot. We demonstrate the feasibility of our method, considered as policy search with redundant action bias (PSRAB), in a reaching and a pick-and-lift task with a 7-DoF Franka robot arm trained in RLBench - a recently introduced benchmark for robotic manipulation - using state-of-the-art TD3 deep reinforcement learning and OpenAI's evolutionary strategy. We show that it is a flexible approach without the need of significant fine-tuning and interference with the main objective even across different policy search methods and tasks of different complexity. We evaluate our approach in simulation and on the real robot. Our project website with videos and further results can be found at: https://sites.google.com/view/redundant-action-bias
Supplementary Material: zip
Poster: png
14 Replies
Loading