Simultaneous on-line discovery and improvement of robotic skill options

Freek Stulp, Laura Herlant, Antoine Hoarau, Gennaro Raiola

Published: 14 Sept 2014, Last Modified: 28 Apr 20252014 IEEE/RSJ International Conference on Intelligent Robots and SystemsEveryoneCC BY 4.0

Abstract: The regularity of everyday tasks enables us to reuse existing solutions for task variations. For instance, most door-handles require the same basic skill (reach, grasp, turn, pull), but small adaptations of the basic skill are required to adapt to the variations that exist (e.g. levers vs. knobs). We introduce the algorithm “Simultaneous On-line Discovery and Improvement of Robotic Skills” (SODIRS) that is able to autonomously discover and optimize skill options for such task variations. We formalize the problem in a reinforcement learning context, and use the PIBB algorithm [2] to continually optimize skills with respect to a cost function. SODIRS discovers new subskills, or “skill options”, by clustering the costs of trials, and determining whether perceptual features are able to predict which cluster a trial will belong to. This enables SODIRS to build a decision tree, in which the leaves contain skill options for task variations. We demonstrate SODIRS’ performance in simulation, as well as on a Meka humanoid robot performing the ball-in-cup task.