Keywords: 3D neural rendering, diffusion model, texture synthesis, multi-modal generation
TL;DR: We leverage tactile sensing to improve geometric details of generated 3D assets for text-to-3D and image-to-3D tasks.
Abstract: 3D generation methods have shown visually compelling results powered by diffusion image priors. However, they often fail to produce realistic geometric details, resulting in overly smooth surfaces or geometric details inaccurately baked in albedo maps. To address this, we introduce a new method that incorporates touch as an additional modality to improve the geometric details of generated 3D assets. We design a lightweight 3D texture field to synthesize visual and tactile textures, guided by diffusion-based distribution matching losses on both visual and tactile domains. Our method ensures the consistency between visual and tactile textures while preserving photorealism. We further present a multi-part editing pipeline that enables us to synthesize different textures across various regions. To our knowledge, we are the first to leverage high-resolution tactile sensing to enhance geometric details for 3D generation tasks. We evaluate our method on both text-to-3D and image-to-3D settings. Our experiments demonstrate that our method provides customized and realistic fine geometric textures while maintaining accurate alignment between two modalities of vision and touch.
Supplementary Material: zip
Primary Area: Generative models
Submission Number: 1396
Loading