Keywords: 3D reconstruction, computer graphics, self-supervised learning, neural optimization
TL;DR: Methodologies and considerations necessary to build a robust workflow that converts a sequence of images of a physical object into a 3D triangular mesh
Abstract: This is an abstract for a 4 minute Startup submission for a presentation on 3D reconstruction of small physical objects directly from a collection of images using Neural Radiance and Neural Distance Field representation. The authors are a team from PHASMATIC, a 3D graphics-focused private company based in Corfu and Athens, Greece, dedicated to developing high-quality 3D digitization and visualization tools for creative and commercial use. PHASMATIC is currently incubated in the Nvidia Inception program & Google Cloud for Startups, as well as is an active member of the Khronos Group & Metaverse Standard Forum fostering the development of 3D and XR open standards. The company is developing solutions that enable fast and robust digitization of entire catalogues of physical objects using only a handful of reference photos for products that lack a digital-twin.
The rapid evolution of deep learning, particularly in the domain of 3D geometric learning, has enabled groundbreaking capabilities in 3D content creation. These advancements support the automatic generation, manipulation, and interpretation of complex spatial information. At the core of this process is a training phase that uses input data such as images or other task-specific modalities captured from the physical world. Such algorithms hold the potential to automate the creation of digital replicas from real-world objects. As a result, these technologies can significantly accelerate the traditionally time-consuming task of generating virtual representations, dramatically streamlining overall workflows.
This presentation introduces a complete pipeline and outlines the key methodologies and considerations necessary to build a robust workflow that converts a sequence of images (or a video stream) of a physical object into a 3D triangular mesh. The proposed pipeline integrates and extends state-of-the-art techniques that lie at the intersection of computer vision, computer graphics, and neural-based optimization. In a nutshell, the pipeline is organized into three distinct phases: pre-processing, optimization, and post-processing. A brief overview of each follows.
In the pre-processing phase, the pipeline accepts a sequence of images representing the target physical object. From this set of images, both intrinsic and extrinsic camera parameters are estimated, which are typically unknown but were originally defined by the user’s capturing device (e.g., smartphone). Various geometric cues from each image are also extracted, such as approximate depth maps and foreground masks that help isolate the object to be reconstructed. All metadata generated per image is then passed to the next stage, the optimization.
At the core of the optimization module is a self-supervised AI agent that rapidly extracts color and geometric information from the pre-processed images. This process builds on Neural Radiance Fields, a recent breakthrough in novel view synthesis and subsequently to 3D reconstruction. To prioritize rapid reconstruction, the algorithm specifically leverages efficient GPU-accelerated methods. The algorithm approximates the scene’s radiance field using a re-parameterization of the volume rendering equation, allowing the capture of the surface’s zero level set. As a self-supervised technique, the model overfits to the input image set and then generates a 3D surface by sampling the trained neural network at a predefined resolution. This leads to the third and final stage, the post-processing.
In the final phase, the pipeline processes the signed distance values sampled from the optimized field to generate an initial triangular mesh. An isosurface extraction algorithm, (e.g., Marching Cubes), is used to construct the polygonal surface, where the resolution directly influences both quality and file size. Next, the surface is simplified by drastically reducing the initial triangle count and UV-unwrapped, allowing for texture baking. Surface colors are encoded into a texture map that enables realistic shading of the mesh. The completed 3D model is then exported in a standard file format (e.g., FBX), making it ready for real-time rendering in engines such as Unity.
The full workflow can be executed in under 10 minutes on commodity GPU hardware (e.g., NVIDIA RTX 4090). However, the pipeline has certain limitations, primarily stemming from the quality of the input data and the implementation specific assumptions integrated into the intermediate steps. Errors introduced early in the pipeline may propagate through subsequent phases, ultimately reducing the fidelity of the final output. To facilitate testing and accessibility, the pipeline is available as a web-based service. Users can upload videos or image collections and receive a reconstructed 3D asset in return. We demonstrate results on various small physical objects captured using standard smartphone cameras and uploaded to the service for reconstruction.
Submission Number: 164
Loading