Keywords: Embodied Foundation Models, Interactive Machine Learning, Data Generation
TL;DR: We show that carefully curated assistive data enables embodied models to provide open-set corrective assistance for unseen behaviors and tasks, and that effective generalization depends on decomposed, sub-skill-focused dataset design.
Abstract: Embodied foundation models are increasingly performant in real-world domains
such as robotics or autonomous driving. These models are often deployed
in interactive or assistive settings, where it is important that these assistive models generalize to new users and new tasks. Diverse interactive data generation offers a promising avenue for providing data-efficient gains for interactive embodied foundation models.
In this paper, we investigate the generalization capabilities of a multimodal foundation model fine-tuned on diverse interactive assistance data in a synthetic domain. We explore generalization along two axes a) assistance with unseen categories of user behavior and b) providing guidance in new task configurations not encountered during training.
We study a broad capability called Open-Set Corrective Assistance, in which the model needs to inspect lengthy user behavior and provide assistance through either corrective actions or language-based feedback.
This task remains unsolved in prior work, which typically assumes closed corrective categories or relies on external planners, making it a challenging testbed for evaluating the limits of assistive data.
To support this task, we generate synthetic assistive datasets in Overcooked and fine-tune a LLaMA-based model to evaluate generalization to novel tasks and defective behaviors. Our approach provides key insights regarding the nature of diverse assistive datasets required to effectively enable open-set assistive intelligence, by showing that performant models benefit from assistive data that explicitly targets individual subskills required for the downstream tasks, such as compositionality, inspection and actuation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 96
Loading