PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation

Lihua Liu; Jiehong Lin; ZhenXin Liu; Kui Jia

PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation

Lihua Liu, Jiehong Lin, ZhenXin Liu, Kui Jia

Published: 08 Aug 2025, Last Modified: 16 Sept 2025CoRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Novel Object Pose Estimation, Robotic Manipulation

Abstract: RGB-based novel object pose estimation is critical for rapid deployment in robotic applications, yet zero-shot generalization remains a key challenge. In this paper, we introduce PicoPose, a novel framework designed to tackle this task using a three-stage pixel-to-pixel correspondence learning process. Firstly, PicoPose matches features from the RGB observation with those from rendered object templates, identifying the best-matched template and establishing coarse correspondences. Secondly, PicoPose smooths the correspondences by globally regressing a 2D affine transformation, including in-plane rotation, scale, and 2D translation, from the coarse correspondence map. Thirdly, PicoPose applies the affine transformation to the feature map of the best-matched template and learns correspondence offsets within local regions to achieve fine-grained correspondences. By progressively refining the correspondences, PicoPose significantly improves the accuracy of object poses computed via PnP/RANSAC. PicoPose achieves state-of-the-art performance on the seven core datasets of the BOP benchmark, demonstrating exceptional generalization to novel objects. Our code and models will be made publicly available.

Supplementary Material: zip

Spotlight: mp4

Submission Number: 234

Loading