Keywords: Score Distillation Sampling; Text-to-3D Generation; Multi-view Inconsistency
TL;DR: We introduce Object-Consistent Distillation (OCD), which enforces an object-level consistency constraint during SDS-based 3D generation to align multi-view pseudo ground truths and significantly reduce artifacts and inconsistencies.
Abstract: Score Distillation Sampling (SDS) struggles to ensure that the pseudo ground truths from different viewpoints generated by the diffusion model correspond to the same 3D object in 3D generation. To analyze object inconsistency in SDS more directly and precisely, we theoretically model the renderings of a 3D object under continuous viewpoints as a connected subset of the image space. Based on this formulation, we introduce an object consistency constraint and identify two key sources of inconsistency: cross-view image discrepancy variation and cross-view distributional estimation error. In contrast to prior works, we focus on the former and propose Object-Consistent Distillation (OCD) which enforces the object consistency constraint during the generation of multi-view pseudo ground truths. Specifically, we estimate a dynamic object proxy using a sliding window and move the rendering of each viewpoint toward this proxy. We compare OCD with several recent generative baselines, and experiments demonstrate that OCD significantly mitigates irregular structures and unrelated artifacts in the generated objects. Code is provided in the supplemental material.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 2749
Loading