Learning Feasibility from Failure Data in Vision–Language–Action Models

Published: 22 Nov 2025, Last Modified: 22 Nov 2025SAFE-ROL PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision Language Action Model, Reasoning, Failure Data
Abstract: In this paper, we study how to improve the robustness of Vision–Language–Action (VLA) models by leveraging failure data. Existing VLAs are often trained on successful demonstrations and make limited use of failures, which can yield trajectories that appear plausible yet execute unreliably under variations. We introduce VINE, a dual–system framework in which a Tree-of-Thoughts planner (System2) is finetuned with both success and failure trajectories to estimate reasoning-level feasibility, while a visuomotor controller (System1) executes subgoal actions. Experiments on plug insertion show that incorporating failure-aware value learning improves success rates, especially in unseen settings, surpassing unified VLA baselines and few-shot VLM planners. Our results highlight failure data as an essential yet underutilized resource for enhancing safety and robustness in embodied reasoning.
Supplementary Zip: zip
Submission Number: 17
Loading