Geo-Refine: Geometry–Appearance Synergy for Robust Single-Image 3D Scene Generation

Xudong Zhang; Hao-Xiang Guo; Fuchun Sun; Eric Li; Yang Liu; Yikai Wang

Geo-Refine: Geometry–Appearance Synergy for Robust Single-Image 3D Scene Generation

Xudong Zhang, Hao-Xiang Guo, Fuchun Sun, Eric Li, Yang Liu, Yikai Wang

17 Sept 2025 (modified: 13 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D Scene Generation, Object Layout Optimization, Multi-View Consistency

Abstract: We introduce Geo-Refine, a single-image 3D scene generator that couples geometry–appearance preprocessing with a two-stage voxel–mesh localization pipeline to produce physically valid, visually complete multi-object scenes. Unlike prior methods that either overfit to image priors or rely on sequential post-hoc segmentation, Geo-Refine follows a unified, end-to-end formulation. Conditioned on one RGB image, it first extracts clean object regions through high-precision masking, directional color-spill suppression, and multi-view appearance consistency, then jointly optimizes object placement and fine mesh alignment. The global layout is cast as an energy-guided voxel reasoning problem that enforces projection evidence, ground support, and semantic co-location, while a subsequent mesh-level refinement stage guarantees collision-free, contact-accurate geometry. Experiments on diverse indoor and outdoor benchmarks show consistent gains in CLIP, VQ, and GPT-4 metrics, along with sharper geometry, stable object interactions, and improved multi-view fidelity over state-of-the-art image-to-3D baselines. These results highlight the value of Geo-refine for reliable single-image 3D scene synthesis and understanding.

Primary Area: generative models

Submission Number: 9524

Loading