NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

Published: 26 Jan 2026, Last Modified: 01 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Feed-forward 3D Reconstruction, Non-pixel-aligned 3D Reconstruction, 3D Completion
TL;DR: Given unposed multi-view images, NOVA3R recovers complete, non-overlapping 3D geometry, reconstructing visible and occluded regions with physical plausibility.
Abstract: We present NOVA3R, an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner. Unlike pixel-aligned methods that tie geometry to per-ray predictions, our formulation learns a global, view-agnostic scene representation that decouples reconstruction from pixel alignment. This addresses two key limitations in pixel-aligned 3D: (1) it recovers both visible and invisible regions with a complete scene representation, and (2) it produces physically plausible geometry with fewer duplicated structures in overlapping regions. To achieve this, we introduce a scene-token mechanism that aggregates information across unposed images and a diffusion-based 3D decoder that reconstructs complete, non-pixel-aligned point clouds. Extensive experiments on both scene-level and object-level datasets demonstrate that NOVA3R outperforms state-of-the-art methods in terms of reconstruction accuracy and completeness.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 17786
Loading