Abstract: Computer simulation provides an automatic and safe way for training robotic control
policies to achieve complex tasks such as locomotion. However, a policy
trained in simulation usually does not transfer directly to the real hardware due
to the differences between the two environments. Transfer learning using domain
randomization is a promising approach, but it usually assumes that the target environment
is close to the distribution of the training environments, thus relying
heavily on accurate system identification. In this paper, we present a different
approach that leverages domain randomization for transferring control policies to
unknown environments. The key idea that, instead of learning a single policy in
the simulation, we simultaneously learn a family of policies that exhibit different
behaviors. When tested in the target environment, we directly search for the best
policy in the family based on the task performance, without the need to identify
the dynamic parameters. We evaluate our method on five simulated robotic control
problems with different discrepancies in the training and testing environment
and demonstrate that our method can overcome larger modeling errors compared
to training a robust policy or an adaptive policy.
Keywords: transfer learning, reinforcement learning, modeling error, strategy optimization
TL;DR: We propose a policy transfer algorithm that can overcome large and challenging discrepancies in the system dynamics such as latency, actuator modeling error, etc.
Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco), [OpenAI Gym](https://paperswithcode.com/dataset/openai-gym)
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/policy-transfer-with-strategy-optimization/code)
11 Replies
Loading