Open-loop VLM Robot Planning: An Investigation of Fine-tuning and Prompt Engineering Strategies

Shogo Akiyama; Rousslan Fernand Julien Dossa; Kai Arulkumaran; Shivakanth Sujit; Edward Johns

Open-loop VLM Robot Planning: An Investigation of Fine-tuning and Prompt Engineering Strategies

Shogo Akiyama, Rousslan Fernand Julien Dossa, Kai Arulkumaran, Shivakanth Sujit, Edward Johns

Published: 05 Apr 2024, Last Modified: 19 Apr 2024VLMNM 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: task planning, VLM, datasets, prompting

TL;DR: Investigates the impact of fine-tuning and prompting techniques on the planning ability of the open-source VideoLLaMA VLM on the EgoPlan-Bench benchmark

Abstract: Recent works have suggested that language-based foundation models contain commonsense knowledge and are capable of performing basic reasoning. This has significant promise in robotics for task-level planning. As an example, the recent EgoPlan-Bench benchmark studies egocentric, embodied planning, measured through multiple-choice questions on captioned videos. In this work, we thoroughly examine the benchmark using open-source 7/13B-parameter models and investigate the impact of different sources of training data, as well as prompting strategies that are widely used outside of the robotics domain. Our experiments show that (1) in-domain and out-of-domain performance is, unsurprisingly, connected with training and evaluation dataset overlap, and (2) surprisingly, prompting strategies that have been effective in other domains, fail to significantly increase performance here.

Submission Number: 28

Loading