Keywords: Chain-of-Thought reasoning, Embodied reasoning, Verifiers, Imitation learning, Vision-Language-Action models
TL;DR: ReVer improves embodied decision-making by verifying sampled reasoning-action candidates before execution, reducing error propogation and acheiving significant gains over standard VLAs on long horizon manipulation tasks.
Abstract: Recent Vision–Language–Action (VLA) models for embodied manipulation typically evaluate action quality only through execution, relying on environment feedback or reinforcement signals for refinement. This trial-and-error paradigm introduces substantial computational and operational overhead, particularly in long-horizon tasks where early errors propagate. We propose Reasoning-Guided Verification (ReVer), a framework that leverages intermediate Chain-of-Thought (CoT) reasoning to assess action reliability prior to execution. Instead of committing to a single reasoning trajectory, ReVer samples diverse reasoning–action candidates and introduces a learned verifier that evaluates their validity. The verifier is trained on a curated dataset of both successful and failed CoT trajectories, enabling it to detect flawed reasoning patterns and anticipate downstream failures. By selecting actions conditioned on verified reasoning, ReVer reduces reliance on costly environment interaction. Experiments on the SIMPLER benchmark show that ReVer improves task success by 13.37\% over OpenVLA and 11.47\% over ECoT, demonstrating enhanced robustness and efficiency in embodied decision-making.
Submission Number: 16
Loading