Abstract: The development of generative models that create 3D
content from a text prompt has made considerable strides
thanks to the use of the score distillation sampling (SDS)
method on pre-trained diffusion models for image genera-
tion. However, the SDS method is also the source of several
artifacts, such as the Janus problem, the misalignment be-
tween the text prompt and the generated 3D model, and 3D
model inaccuracies. While existing methods heavily rely on
the qualitative assessment of these artifacts through visual
inspection of a limited set of samples, in this work we pro-
pose more objective quantitative evaluation metrics, which
we cross-validate via human ratings, and show analysis of
the failure cases of the SDS technique. We demonstrate the
effectiveness of this analysis by designing a novel computa-
tionally efficient baseline model that achieves state-of-the-
art performance on the proposed metrics while addressing
all the above-mentioned artifacts
Loading