Learning Deep Robot Controllers by Exploiting Successful and Failed Executions

Domingo Esteban, Leonel Dario Rozo, Darwin G. Caldwell

2018 (modified: 09 Jun 2022)Humanoids 2018Readers: Everyone

Abstract: The prohibitively amount of data required when learning complex nonlinear policies, such as deep neural networks, has been significantly reduced with guided policy search (GPS)algorithms. However, while learning the control policy, the robot might fail and therefore generate unacceptable guiding samples. Failures may arise, for example, as a consequence of modeling or environmental uncertainties, and thus unsuccessful interactions should be explicitly considered while learning a complex policy. Currently, GPS methods update the robot policy discarding or giving low probability to unsuccessful trials. In other words, these methods overlook the existence of poorly performing executions, and therefore tend to underestimate the information of these interactions in next iterations. In this paper we propose to learn deep neural network controllers with an extension of G PS that considers trajectories optimized with dualist constraints. These constraints are aimed at assisting the policy learning so that the trajectory distributions updated at each iteration are similar to good trajectory distributions (e.g., sucessful executions)while differing from bad trajectory distributions (e.g. failures). We show that neural network policies guided by trajectories optimized with our method reduce the failures during the policy exploration phase, and therefore encourage safer interactions. This may have a relevant impact in tasks that involve physical contact with the environment or human partners.

0 Replies