EVE: A Generator-Verifier System for Generative Policies

Published: 13 May 2026, Last Modified: 13 May 2026ICRA 2026: From Data to Decisions PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision-Language-Action Models, Test-Time Scaling, Verifiers, LLM Reasoning, Visuomotor Control
TL;DR: Boost VLA performance at inference-time using VLM-based verification feedback
Abstract: Visuomotor policies based on generative architectures such as diffusion and flow-based matching have shown strong performance but degrade under distribution shifts, demonstrating limited recovery capabilities without costly finetuning. In the language modeling domain, test-time compute scaling has revolutionized modern LLMs by leveraging foundation models as zero-shot verification modules to refine candidate solutions. We hypothesize that generative policies can similarly benefit from zero-shot VLM-based verifiers at inference time, a direction that remains relatively underexplored. To this end, we introduce EVE — a modular, generator-verifier interaction framework — that boosts the performance of pretrained generative policies at test time, with no additional training. EVE wraps a frozen base policy with multiple zero-shot, VLM-based verifier agents. Each verifier proposes action refinements, while an action incorporator fuses the aggregated verifier output into base policy denoising to produce the final action. Across a diverse suite of tasks and embodiments, EVE consistently improves task success rates without policy or verifier finetuning. Our ablations isolate contribution of verifier capabilities and action incorporator strategies.
Submission Number: 47
Loading