Asymmetric self-play for automatic goal discovery in robotic manipulation

OpenAI OpenAI; Matthias Plappert; Raul Sampedro; Tao Xu; Ilge Akkaya; Vineet Kosaraju; Peter Welinder; Ruben D'Sa; Arthur Petron; Henrique Ponde de Oliveira Pinto; Alex Paino; Hyeonwoo Noh; Lilian Weng; Qiming Yuan; Casey Chu; Wojciech Zaremba

Asymmetric self-play for automatic goal discovery in robotic manipulation

OpenAI OpenAI, Matthias Plappert, Raul Sampedro, Tao Xu, Ilge Akkaya, Vineet Kosaraju, Peter Welinder, Ruben D'Sa, Arthur Petron, Henrique Ponde de Oliveira Pinto, Alex Paino, Hyeonwoo Noh, Lilian Weng, Qiming Yuan, Casey Chu, Wojciech Zaremba

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: self-play, asymmetric self-play, automatic curriculum, automatic goal generation, robotic learning, robotic manipulation, reinforcement learning

Abstract: We train a single, goal-conditioned policy that can solve many robotic manipulation tasks, including tasks with previously unseen goals and objects. To do so, we rely on asymmetric self-play for goal discovery, where two agents, Alice and Bob, play a game. Alice is asked to propose challenging goals and Bob aims to solve them. We show that this method is able to discover highly diverse and complex goals without any human priors. We further show that Bob can be trained with only sparse rewards, because the interaction between Alice and Bob results in a natural curriculum and Bob can learn from Alice's trajectory when relabeled as a goal-conditioned demonstration. Finally, we show that our method scales, resulting in a single policy that can transfer to many unseen hold-out tasks such as setting a table, stacking blocks, and solving simple puzzles. Videos of a learned policy is available at https://robotics-self-play.github.io.

One-sentence Summary: We use asymmetric self-play to train a goal-conditioned policy for complex object manipulation tasks, and the learned policy can zero-shot generalize to many manually designed holdout tasks.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/asymmetric-self-play-for-automatic-goal/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=g9eNhiUYO

11 Replies

Loading