Keywords: representation learning, sim2real transfer
Abstract: Transferring robotic policies from simulation to the real world often faces perceptual challenges, where visual differences degrade performance. Policies relying on 6D pose state estimation, require task-specific scaffolding, while raw sensor-based policies lack robustness and efficiency. We propose using 2D keypoints—spatially consistent features in the image frame—as a state representation for effective sim-to-real transfer. Our method, ATK, automatically selects a minimal set of task-relevant keypoints that predict optimal behavior. By distilling a teacher policy trained in simulation into a student policy operating on RGB images while tracking the selected keypoints, our system effectively tracks keypoints and transfers policies to the real world, even under perceptual challenges like transparent objects or fine-grained manipulation. We validate ATK across various tasks, showing the minimal set of task-relevant keypoint representations improved robustness to visual and environmental variations.
Supplementary Material: zip
Submission Number: 39
Loading