LIV: Language-Image Representations and Rewards for Robotic ControlDownload PDF

Published: 03 Mar 2023, Last Modified: 17 Nov 2024RRL 2023 PosterReaders: Everyone
Keywords: pre-training, representation learning, reward learning, robot learning, vision-language, reinforcement learning, imitation learning
TL;DR: A unified algorithm for pre-training and fine-tuning vision-language representations and rewards for language-conditioned robotic control
Abstract: Motivated by the growing research in natural language-based task interfaces for robotic tasks, we seek good vision-language representations specialized for control. We posit that such representations should: (1) align the two modalities to permit grounding language-based task specifications in visual state-based task rewards, (2) capture sequentiality and task-directed progress in conjunction with cross-modality alignment, and (3) permit extensive pre-training from large generic datasets as well as fine-tuning on small in-domain datasets. We achieve these desiderata through Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations. We use LIV to pre-train the first control-centric vision-language representation from large human video datasets such as EpicKitchen with no action information. Then, with access to target domain data, the very same objective consistently improves this pre-trained LIV model as well as other pre-existing vision-language representations for language-conditioned control. On two simulated robot domains that evaluate vision-language representations and rewards, LIV pre-trained and fine-tuned models consistently outperform the best prior approaches, establishing the advantages of joint vision-language representation and reward learning within its unified, compact framework.
Track: Technical Paper
Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 4 code implementations](https://www.catalyzex.com/paper/liv-language-image-representations-and/code)
2 Replies

Loading