Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning

Published: 05 Sept 2024, Last Modified: 08 Nov 2024CoRL 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Visual Generalization, Sim2real, Reinforcement Learning
TL;DR: A generalizable framework for visual RL, and can facilitate the sim2real transfer.
Abstract: Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose Maniwhere, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning approach fused with Spatial Transformer Network (STN) module to capture shared semantic information and correspondences among different viewpoints. In addition, we employ a curriculum-based randomization and augmentation approach to stabilize the RL training process and strengthen the visual generalization ability. To exhibit the effectiveness of Maniwhere, we meticulously design **8** tasks encompassing articulate objects, bi-manual, and dexterous hand manipulation tasks, demonstrating Maniwhere's strong visual generalization and sim2real transfer abilities across **3** hardware platforms. Our experiments show that Maniwhere significantly outperforms existing state-of-the-art methods. Videos are provided at https://maniwhere.github.io.
Supplementary Material: zip
Spotlight Video: mp4
Website: https://gemcollector.github.io/maniwhere/
Publication Agreement: pdf
Student Paper: yes
Submission Number: 8
Loading