Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Zehan Wang; Ziang Zhang; Tianyu Pang; Chao Du; Hengshuang Zhao; Zhou Zhao

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Zehan Wang, Ziang Zhang, Tianyu Pang, Chao Du, Hengshuang Zhao, Zhou Zhao

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Practical Solution for Estimating Object Orientation in Images

Abstract: Orientation is a fundamental attribute of objects, essential for understanding their spatial pose and arrangement. However, practical solutions for estimating the orientation of open-world objects in monocular images remain underexplored. In this work, we introduce Orient Anything, the first foundation model for zero-shot object orientation estimation. A key challenge in this task is the scarcity of orientation annotations for open-world objects. To address this, we propose leveraging the vast resources of 3D models. By developing a pipeline to annotate the front face of 3D objects and render them from random viewpoints, we curate 2 million images with precise orientation annotations across a wide variety of object categories. To fully leverage the dataset, we design a robust training objective that models the 3D orientation as probability distributions over three angles and predicts the object orientation by fitting these distributions. Besides, we propose several strategies to further enhance the synthetic-to-real transfer. Our model achieves state-of-the-art orientation estimation accuracy on both rendered and real images, demonstrating impressive zero-shot capabilities across various scenarios. Furthermore, it shows great potential in enhancing high-level applications, such as understanding complex spatial concepts in images and adjusting 3D object pose.

Lay Summary: While object orientation is fundamental to discerning spatial relationships within images, its estimation remains an under-researched domain. Our work introduces a visual foundation model engineered to infer the orientation of arbitrary objects within a single image. This innovation is poised to bolster advanced applications, including the comprehension of sophisticated spatial concepts and the refinement of 3D object pose adjustments.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/SpatialVision/Orient-Anything

Primary Area: Applications->Computer Vision

Keywords: Object pose, Orientation

Submission Number: 10758

Loading