Keywords: Policy-Free Task Solving, Reinforcement Learning, Efficient Exploration
Abstract: Traditional policy learning in reinforcement learning relies on costly annotated data from extensive environment interaction. In contrast, massive unlabeled videos contain rich task knowledge but remain underutilized. Inspired by how humans acquire skills from watching videos, we propose Policy-Free Flow Search (PFFS).
Not depending on explicit policies, PFFS learns to understand tasks through temporal consistency in single demonstrations and structural alignment across them. It models task stage transitions autoregressively to form a coherent task flow. At deployment, PFFS performs backward planning to generate a goal-to-initial task flow, then executes forward search to solve the task along this flow with minimal exploration.
For further utility, we extend PFFS to PFFS-RL, an reinforcement learning (RL)
framework using save-point-structured trajectories and task-flow-aligned rewards, significantly boosting exploration efficiency. Experiments show PFFS solves Minecraft tasks with very few exploration in a policy-free manner, while PFFS-RL outperforms other RL baselines with improved exploration under the same data volume. This work introduces a novel policy-free paradigm to leverage unlabeled videos for efficient task solving, advancing decision-making in resource-constrained scenarios.
Primary Area: reinforcement learning
Submission Number: 24099
Loading