Adaptability Preserving Domain Decomposition for Stabilizing Sim2Real Reinforcement LearningDownload PDFOpen Website

2020 (modified: 04 Nov 2022)IROS 2020Readers: Everyone
Abstract: In sim-to-real transfer of Reinforcement Learning (RL) policies for robot tasks, Domain Randomization (DR) is a widely used technique for improving adaptability. However, in DR there is a conflict between adaptability and training stability, and heavy DR tends to result in instability or even failure in training. To relieve this conflict, we propose a new algorithm named Domain Decomposition (DD) that decomposes the randomized domain according to environments and trains a separate RL policy for each part. This decomposition stabilizes the training of each RL policy, and as we prove theoretically, the adaptability of the overall policy can be preserved. Our simulation results verify that DD really improves stability in training while preserving ideal adaptability. Further, we complete a complex real-world vision-based patrolling task using DD, which demonstrates DD’s practicality. A video is attached as supplementary material.
0 Replies

Loading