Keywords: Robot manipulation, Imitation learning, Visuomotor policies
Abstract: Large-scale imitation-learning-based visuomotor policies have been widely used in robot manipulation, where both visual observations and proprioceptive states are typically adopted together for precise control. However, whether proprioceptive state is necessary for learning robust policies remains unclear, and it can also make the policy overly reliant on the proprioceptive state. This leads to overfitting to training trajectories and poor spatial generalization. In this study, we investigate the State-free Policy, removing the proprioceptive state input completely. The State-free Policy is built in the relative end-effector action space, and more importantly, we find that making a State-free Policy work well requires sufficient task-relevant visual observations (ensured by dual wide-angle wrist cameras). Empirical results demonstrate that the State-free Policy achieves significantly stronger spatial generalization than the state-based policy. Across multiple real-world tasks and robot embodiments, the average success rate improves from 0\% to 85\% in height generalization and from 6\% to 64\% in horizontal generalization. Furthermore, it also shows advantages in data efficiency and cross-embodiment adaptation, suggesting a promising direction for building more scalable robot learning systems in the real world.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 17
Loading