How to Spin an Object: First, Get the Shape Right

13 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: image-to-3D, multiview diffusion, pointmaps
TL;DR: Direct image-to-3D via view synthesis using a novel pointmap representation
Abstract: We present unPIC, a method for generating novel 3D-consistent views of an object from a single image. Given one input view, unPIC produces a full spin of the object around its vertical axis, a process that is typically a precursor for reconstructing the object in 3D. Our key idea is to predict the object's underlying 3D geometry from the input image _before_ predicting the textured appearance of the novel views. To this end, unPIC consists of two modules: a multiview geometry _prior_, followed by a multiview appearance _decoder_, both implemented as diffusion models but trained separately. During inference, the geometry serves as a blueprint to coordinate the generation of the final novel views, thus enforcing consistency across the object's 360-degree spin. We introduce a novel pointmap-based representation to capture the geometry, with one key advantage: it allows us to obtain a 3D point cloud directly as part of the view-synthesis process, rather than a post-hoc step. Our modular, geometry-driven framework demonstrates superior performance, outperforming leading methods like InstantMesh, EscherNet, CAT3D, and Direct3D on novel-view quality, geometric accuracy, and multiview-consistency metrics. Furthermore, unPIC shows strong generalization to challenging, real-world captures from datasets like Google Scanned Objects and the Digital Twin Catalog.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 4708
Loading