Split Decisions: VLM-Guided Action Sampling for Efficient RL Exploration

Kuan-Hsun Tu; Hsuan-Chi Liu; Po Yi Wu; Shao-Hua Sun; Tsung-Wei Ke

Split Decisions: VLM-Guided Action Sampling for Efficient RL Exploration

Kuan-Hsun Tu, Hsuan-Chi Liu, Po Yi Wu, Shao-Hua Sun, Tsung-Wei Ke

13 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: robot learning, reinforcement learning, manipulation, vision-language model, vision-language-action model

TL;DR: Split Decisions helps vision-language models explore smarter by using vision-language models as high-level planners, so they stop flailing around and start learning useful tasks faster.

Abstract: Reinforcement learning (RL) offers a general framework for adapting vision-language-action models (VLAs) to new tasks, but its effectiveness is often bottlenecked by inefficient exploration. Existing strategies waste interactions on uninformative behaviors, hindering sample efficiency. We introduce Split Decisions, an exploration framework that leverages semantic priors from vision-language models (VLMs) to guide VLAs toward more promising regions of the action space. In our approach, the VLM serves as a high-level planner that proposes subgoals, while the VLA acts as a low-level controller that samples and executes actions aligned with those subgoals. This structured guidance improves both the efficiency and quality of exploration, enabling policies to discover rewarding strategies more quickly. We evaluate Split Decisions on robotic manipulation tasks in SimplerEnv under both online and offline RL settings. In online fine-tuning, it achieves up to a 31\% gain in task success with the same interaction budget, while in offline training, datasets collected with Split Decisions provide a 27.5\% improvement over prior methods. These results establish Split Decisions as a general and effective paradigm for enhancing exploration in VLA adaptation.

Primary Area: reinforcement learning

Submission Number: 4800

Loading