SilRef: Joint Visual Silhouette and Tactile Pose Optimization for Transparent Object Manipulation

Published: 2026, Last Modified: 14 Feb 2026IEEE Robotics Autom. Lett. 2026EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Transparent objects are ubiquitous in laboratory automation settings, as liquids need to be visually controlled regularly. Automating laboratory processes would make the creation of small-batch medication feasible, thus making more personalized and better-targeted treatments more accessible. However, transparent objects present a major challenge for robust vision systems, in turn compromising their manipulation. Their appearance varies depending on the environment and depth sensors fail to capture their measurements. These objects therefore break central assumptions made by depth-based as well as render-and-compare pose refinement strategies. To ensure reliable pose estimation, we propose Silhouette-based object pose Refinement (SilRef), a novel pose refinement approach leveraging object silhouette detection and geometric cues, circumventing the need for depth maps or realistic rendering making it robust to environment change. Our proposed formulation directly optimizes the poses by gradient descent based on 3D models rendering and benefits from a large convergence basin. SilRef is evaluated on the Keypose dataset and the newly collected Tracebot In-Gripper dataset. Results show an improvement of 2.8x and 2.7x in Average Distance of Model Points-Symmetric (ADD-S@0.01 m) when the object is standing on a surface and when the object is already grasped, respectively, compared to Megapose6D and ICP (Iterative Closest Point).
Loading