Keywords: Vision-language-action Model, sampling, inference-time compute
TL;DR: This paper introduces training-free policy-only power sampling for VLA models. It sharpens policy probabilities to improve success, stability, and efficiency under distribution shifts.
Abstract: We study whether training-free power-distribution sampling, originally proposed for large language model reasoning, can improve vision-language-action (VLA) control under distribution shift. In closed-loop control, actions affect future observations, so the powered next-decision conditional contains a future-correction term. We study this issue in a chunked-control regime used by high-performance VLAs, where the mismatch between local temperature scaling and trajectory-level power sampling persists at the chunk level. Because exact trajectory powering also sharpens simulator and observation randomness, we introduce *policy-only power sampling*, which powers only the policy while leaving rollout stochasticity unchanged. We then adapt Power-SMC to chunked replanning, where particles represent imagined action-chunk rollouts and importance weights depend only on policy log-probabilities. On ManiSkill3 pick-and-place out-of-distribution benchmarks with PPO-trained OpenVLA-OFT checkpoint, chunk-level Power-SMC more often reaches a successful state, more reliably remains successful through episode end, and does so with fewer executed actions until success. These results suggest that policy-only power sampling is a promising inference-time adaptation mechanism for VLA control under distribution shift.
Submission Number: 98
Loading