3D Neural Embedding Likelihood for Robust Sim-to-Real Transfer in Inverse Graphics

Guangyao Zhou; Nishad Gothoskar; Lirui Wang; Joshua B. Tenenbaum; Dan Gutfreund; Miguel Lazaro-Gredilla; Dileep George; Vikash Mansinghka

3D Neural Embedding Likelihood for Robust Sim-to-Real Transfer in Inverse Graphics

Guangyao Zhou, Nishad Gothoskar, Lirui Wang, Joshua B. Tenenbaum, Dan Gutfreund, Miguel Lazaro-Gredilla, Dileep George, Vikash Mansinghka

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone

Keywords: 3D inverse graphics, probabilistic inference, likelihood, RGB-D, neural embedding, object pose estimation

TL;DR: We propose 3D Neural Embedding Likelihoods (3DNEL), a 3D likelihood that models both shape information from depth and appearance information from RGB via neural embeddings and bridges the sim-to-real gap in 3D inverse graphics.

Abstract: A central challenge in 3D scene perception via inverse graphics is robustly modeling the gap between 3D graphics and real-world data. We propose a novel 3D Neural Embedding Likelihood (3DNEL) over RGB-D images to address this gap. 3DNEL uses neural embeddings to predict 2D-3D correspondences from RGB and combines this with depth in a principled manner. 3DNEL is trained entirely from synthetic images and generalizes to real-world data. To showcase this capability, we develop a multi-stage inverse graphics pipeline that uses 3DNEL for 6D object pose estimation from real RGB-D images. Our method outperforms the previous state-of-the-art in sim-to-real pose estimation on the YCB-Video dataset, and improves robustness, with significantly fewer large-error predictions. Unlike existing bottom-up, discriminative approaches that are specialized for pose estimation, 3DNEL adopts a probabilistic generative formulation that jointly models multi-object scenes. This generative formulation enables easy extension of 3DNEL to additional tasks like object and camera tracking from video, using principled inference in the same probabilistic model without task specific retraining.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

13 Replies

Loading