Physically Plausible Object Pose Refinement in Cluttered Scenes

Published: 2024, Last Modified: 24 Sept 2025GCPR (2) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Estimating the 6-DoF pose of objects from images is a fundamental task in computer vision and a prerequisite for downstream tasks like augmented reality or robotic grasping applications. This task becomes particularly challenging in cluttered scenes, when many objects are present in the image in close proximity and occlude one another. However, the close proximity between objects also provides additional cues about the objects, as objects in physically plausible scenes do not intersect one another and thus occluding objects constrain the ones they occlude. We present a novel approach for utilizing this information in 6-DoF object pose refinement of known objects. Our formulation extends RAFT-based pose refinement to reduce penetrations between objects to a large degree and leads to more plausible object poses with less penetrations. We evaluate our approach quantitatively and qualitatively on two benchmark datasets, demonstrate improvements over baselines, and will make the source code of our approach publicly available to foster future research in this area.
Loading