Track: full paper
Keywords: Self-supervised learning, Representation learning, Vision-based robot learning
TL;DR: We propose a self-supervised state representation learning approach on dynamics scenes that explicitly guide backbone models to form visual state representations for robots
Abstract: In robot policy learning, deriving informative state representations encompassing visual and proprioceptive representations is critical. While proprioceptions are acquired from internal sensors, visual state representations primarily rely on vision backbones. Therefore, leveraging a strong backbone generalized across diverse tasks and environments is essential for effective robotic perception. Self-supervised learning (SSL) has been a promising approach for pre-training such backbones. However, conventional SSL approaches for visual representation learning have predominantly focused on learning capability for a comprehensive understanding of a whole image or video, far from requisites for robotics such as seamless interactions. Bearing this in mind, we introduce a novel and intuitive self-supervised visual state representation learning pipeline designed to facilitate the acquisition of state representations through masked autoencoding. Our method implicitly dissolves the forming process of the state representations into the encoding process without any additional layers. Extensive experiments in diverse simulated environments demonstrate the superiority of our method in robot manipulation and locomotion tasks over previous baselines. Moreover, deploying our pre-trained model on physical robots confirms its robustness and effectiveness in real-world settings.
Presenter: ~Taekyung_Kim4
Format: Yes, the presenting author will definitely attend in person because they are attending ICLR for other complementary reasons.
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 64
Loading