Abstract: Recently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models. Existing large 3D Gaussian models directly map 2D image to 3D Gaussian parameters, while regressing 2D image to 3D Gaussian representations is challenging without 3D priors. In this paper, we propose a large Point-to-Gaussian model, that inputs the initial point cloud produced from large 3D diffusion model conditional on 2D image to generate the Gaussian parameters, for image-to-3D generation. The point cloud provides initial 3D geometry prior for Gaussian generation, thus significantly facilitating image-to-3D Generation. Moreover, we present the Attention mechanism, Projection mechanism, and Point feature extractor, dubbed as APP block, for fusing the image features with point cloud features. The qualitative and quantitative experiments extensively demonstrate the effectiveness of the proposed approach on GSO and Objaverse datasets, and show the proposed method achieves state-of-the-art performance.
Primary Subject Area: [Generation] Generative Multimedia
Secondary Subject Area: [Generation] Generative Multimedia, [Content] Vision and Language
Relevance To Conference: Generating high-quality 3D assets from images is a pivotal task in multimedia generation, notably in gaming, film production, VR/AR, etc. The learning-based 3D generation algorithms allow rapid generation of high-quality 3D assets free of tedious manual processes and complex computer graphics tools. In this paper, we proposed Large Point-to-Gaussian Model for Image-to-3D Generation, which is a novel cross-modality generation framework. Our method can quickly generate 3D assets from an image, which greatly improves the quality and efficiency of multimedia generation. Therefore, our work is important for multimedia research, which is highly relevant to the conference.
Supplementary Material: zip
Submission Number: 920
Loading