Building Generalist Robot Policy from Pre-trained Visual Representations

27 Sept 2024 (modified: 26 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: robot learning, pre-trained vision models, generalizability
Abstract: In this paper, we investigate the use of vision pre-trained models (PTMs) for developing generalist robot manipulation policies. We study whether embodied policies trained with representations from vision and language PTMs are capable of multi-tasking and overcoming domain gaps. Evaluating a set of off-the-shelf vision PTMs, our first finding is that the commonly used global features are generally inadequate for building multi-task robot manipulation policies, while keeping local features significantly improves in-domain performance and out-of-domain generalizibility. Experiment results show that DINOv2, a model trained on conventional vision datasets, outperforms models explicitly designed for robot learning. To bridge the domain gaps, we further experiment on the effect of augmentation methods on embodied robot policies and few-shot adaptation. On the later case, we propose a novel objective by introducing self-distillation to the objectives of few-shot adaptation. Experiment results show that our approach is compatible with multiple PTMs, improving performance on novel domains when the number of demonstration available is limited.
Primary Area: applications to robotics, autonomy, planning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11910
Loading