TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation

Mingwei Li; Hehe Fan; Yi Yang

TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation

Mingwei Li, Hehe Fan, Yi Yang

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY-NC 4.0

TL;DR: TransNormal repurposes Stable Diffusion for transparent object normal estimation by replacing sparse text conditioning with dense DINOv3 visual semantics, effectively resolving geometric ambiguities caused by refraction and reflection in glassware.

Abstract: Monocular normal estimation for transparent objects is critical for laboratory automation, yet it remains challenging due to complex light refraction and reflection. These optical properties often lead to catastrophic failures in conventional depth and normal sensors, hindering the deployment of embodied AI in scientific environments. We propose **TransNormal**, a novel framework that adapts pre-trained diffusion priors for single-step normal regression. To handle the lack of texture in transparent surfaces, TransNormal integrates dense visual semantics from DINOv3 via a cross-attention mechanism, providing strong geometric cues. Furthermore, we employ a multi-task learning objective and wavelet-based regularization to ensure the preservation of fine-grained structural details. To support this task, we introduce **TransNormal-Synthetic**, a physics-based dataset with high-fidelity normal maps for transparent labware. Extensive experiments demonstrate that TransNormal significantly outperforms state-of-the-art methods: on the ClearGrasp benchmark, it reduces mean error by 25.5\% and improves the best prior $11.25^\circ$ accuracy by 24.7\%; on ClearPose, it achieves a 17.7\% reduction in mean error. Code and dataset are publicly available at https://github.com/longxiang-ai/TransNormal.

Lay Summary: Transparent objects like glass beakers and pipettes are hard for robots to understand because they reflect and bend light, often making ordinary cameras and depth sensors see the wrong shape. This is a problem for automated laboratories, where robots must know where an object’s surface is and which way it faces before they can grasp, pour, or handle liquids safely. We developed TransNormal, an AI system that estimates the surface shape of transparent labware from a single image. Instead of relying only on visible edges or texture, which are often missing on glass, the system uses broader visual context to infer the likely shape of the object. We also created a large synthetic dataset of transparent lab objects with accurate surface-shape labels, making it possible to train and test this task more reliably. In experiments, TransNormal made substantially fewer errors than previous methods on both synthetic and real-world test sets. This work can help robots perceive transparent tools more reliably, supporting safer and more capable laboratory automation.

Link To Code: https://github.com/longxiang-ai/TransNormal

Primary Area: Deep Learning->Foundation Models

Keywords: Surface Normal Estimation, Geometry Estimation, Diffusion Models, Transparent Objects, Transparent Surface

Originally Submitted PDF: pdf

Submission Number: 1935

Loading