Keywords: Score Distillation, Text-to-3D Generation, Diffusion Model
TL;DR: We propose a progressive latent calibration method for inversion-free score distillation.
Abstract: Recent advancements in Score Distillation Sampling (SDS) have significantly accelerated progress in text-to-3D generation by leveraging pre-trained 2D diffusion models to supervise 3D representations. However, SDS often suffers from high variance and produces over-smoothed outputs, limiting the quality of the synthesized 3D assets. While recent methods have introduced DDIM inversion to stabilize optimization, we identify that repeated DDIM inversion introduces discretization errors that accumulate over thousands of iterations, ultimately leading to severe artifacts such as structural distortions and color degradation. To address these limitations, we introduce a novel score distillation framework that eliminates the reliance on DDIM inversion by leveraging multi-step pseudo-ground-truth sampling with progressive latent calibration. Our approach explicitly estimates and reintegrates information loss about the original rendering from a 3D representation during multi-step sampling, thereby preserving semantic fidelity and reducing variance across training iterations. Extensive experiments show that our method consistently outperforms existing inversion-based and standard score distillation approaches in generating high-fidelity 3D assets from text prompts. The anonymous project page is available at https://anonymous-iclr-sd.github.io/.
Primary Area: generative models
Submission Number: 11956
Loading