Keywords: 3D Generation, Gaussian Splatting, Stable Diffusion, Inference-Time Scaling
TL;DR: We propose TRIM, a post-training framework that accelerates 3D Gaussian diffusion by pruning trajectories and background tokens, enabling efficient inference-time scaling without sacrificing quality.
Abstract: Recent advances in 3D Gaussian diffusion models suffer from time-intensive denoising and post-denoising processing due to the massive number of Gaussian primitives, resulting in slow generation and limited scalability along sampling trajectories.
To improve the efficiency of 3D diffusion models,
we propose $\textbf{TRIM}$ ($\textbf{T}$rajectory $\textbf{R}$eduction and $\textbf{I}$nstance $\textbf{M}$ask denoising), a post-training approach that incorporates both temporal and spatial trimming strategies, to accelerate inference without compromising output quality while supporting the inference-time scaling for Gaussian diffusion models.
Instead of scaling denoising trajectories in a costly end-to-end manner, we develop a lightweight selector model to evaluate latent Gaussian primitives derived from multiple sampled noises, enabling early trajectory reduction by selecting candidates with high-quality potential.
Furthermore, we introduce instance mask denoising to prune learnable Gaussian primitives by filtering out redundant background regions, reducing inference computation at each denoising step.
Extensive experiments and analysis demonstrate that TRIM significantly improves both the efficiency and quality of 3D generation.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 1205
Loading