Abstract: The most prominent criterion for learning of manipulation skills is the optimization of task success, modeled as expected reward or probability of success. This is sensible if we only want to optimize a single controller. But if learned manipulation primitives are used as modules in a larger system, then it is also important that their generated sensor traces facilitate recognition of action-outcomes. Optimization solely for expected success of a primitive does not guarantee this. We demonstrate a simple example for optimization of actions towards observability, combined with optimization for expected success. Our experiment is a manipulation task with a soft manipulator, where an action primitive is learned such that its generated sensor trace helps a classifier to distinguish task success and task failure. The experimental results indicate that adding auxiliary forces to the original manipulation primitive can indeed facilitate outcome recognition for manipulation tasks.
Loading