Abstract: Robots must know how to be gentle when they need to interact with fragile objects,
or when the robot itself is prone to wear-and-tear. We propose an approach that
enables deep reinforcement learning to train policies that are gentle, both during
exploration and task execution. Our approach involves augmenting the (task)
reward with a penalty for non-gentleness. However, augmenting with only this
penalty impairs learning: policies get stuck in a local optimum of avoiding all
contact with the environment. Introducing surprise-based intrinsic rewards solves
this problem, as long as the right kind of surprise is chosen—penalty-based surprise
is more effective than the typical dynamics-based surprise. Videos are available at
http://sites.google.com/view/gentlemanipulation.
0 Replies
Loading