EVE: A Generator-Verifier System for Generative Policies

Yusuf Ali; Gryphon Patlin; Karthik Kothuri; Jeremiah Coholich; Muhammad Zubair Irshad; Wuwei Liang; Zsolt Kira

EVE: A Generator-Verifier System for Generative Policies

Yusuf Ali, Gryphon Patlin, Karthik Kothuri, Jeremiah Coholich, Muhammad Zubair Irshad, Wuwei Liang, Zsolt Kira

Published: 13 May 2026, Last Modified: 13 May 2026ICRA 2026: From Data to Decisions PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language-Action Models, Test-Time Scaling, Verifiers, LLM Reasoning, Visuomotor Control

TL;DR: Boost VLA performance at inference-time using VLM-based verification feedback

Abstract: Visuomotor policies based on generative architectures such as diffusion and flow-based matching have shown strong performance but degrade under distribution shifts, demonstrating limited recovery capabilities without costly finetuning. In the language modeling domain, test-time compute scaling has revolutionized modern LLMs by leveraging foundation models as zero-shot verification modules to refine candidate solutions. We hypothesize that generative policies can similarly benefit from zero-shot VLM-based verifiers at inference time, a direction that remains relatively underexplored. To this end, we introduce EVE — a modular, generator-verifier interaction framework — that boosts the performance of pretrained generative policies at test time, with no additional training. EVE wraps a frozen base policy with multiple zero-shot, VLM-based verifier agents. Each verifier proposes action refinements, while an action incorporator fuses the aggregated verifier output into base policy denoising to produce the final action. Across a diverse suite of tasks and embodiments, EVE consistently improves task success rates without policy or verifier finetuning. Our ablations isolate contribution of verifier capabilities and action incorporator strategies.

Submission Number: 47

Loading