Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop

Justin Kerr; Kush Hari; Ethan Weber; Chung Min Kim; Brent Yi; tyler bonnen; Ken Goldberg; Angjoo Kanazawa

Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop

Justin Kerr, Kush Hari, Ethan Weber, Chung Min Kim, Brent Yi, tyler bonnen, Ken Goldberg, Angjoo Kanazawa

Published: 08 Aug 2025, Last Modified: 16 Sept 2025CoRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Active Vision, Reinforcement Learning, Behavior Cloning, Manipulation

TL;DR: Training an active mechanical eyeball policy with RL to look around to facilitate behavior cloning over a large workspace

Abstract: Humans do not passively observe the visual world---we actively look in order to act. Motivated by this principle, we introduce EyeRobot, a robotic system with gaze behavior that emerges from the need to complete real-world tasks. We develop a mechanical eyeball that can freely rotate to observe its surroundings and train a gaze policy to control it using reinforcement learning. We accomplish this by introducing a BC-RL loop trained using teleoperated demonstrations recorded with a 360 camera. The resulting video enables a simulation environment that supports rendering arbitrary eyeball viewpoints, allowing reinforcement learning of gaze behavior. The hand (BC) agent is trained from rendered eye observations, and the eye (RL) agent is rewarded when the hand produces correct actions. In this way, hand-eye coordination emerges as the eye looks towards regions which allow the hand to complete the task. We evaluate EyeRobot on five large workspace manipulation tasks and compare performance to two common camera setups: wrist and external cameras. Our experiments suggest EyeRobot exhibits hand-eye coordination which effectively facilitates action such as visual search or target switching, which enable manipulation across large workspaces.

Supplementary Material: zip

Spotlight: mp4

Submission Number: 909

Loading