Diff-StyGS: 3D Gaussian Splatting Stylization via Tuning-Free Multi-View Sparse Diffusion

16 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion Model, 3D Gaussian Splatting, Style Transfer
Abstract: Realistic stylization in 3D Gaussian Splatting (3DGS) faces critical challenges due to restricted cross-modal style inputs (text/image) and the difficulty of preserving multi-view consistency without sacrificing efficiency. Existing methods either depend on fine-tuned conditional diffusion models (e.g., InstructPix2Pix) or require style-specific losses and latents. In this paper, we propose Diff-StyGS, a novel framework enabling 3D style transfer with multimodal inputs for pre-trained 3DGS via tuning-free Stable Diffusion (SD). Our approach introduces multi-view stylized attention by dual attention control in SD with (i) Style-Infused Attention (SIA) and (ii) Multi-View Adaptive Sparse Attention via Shared-Query (MASA-SQ). Specifically, SIA decouples content by reusing 3DGS-rendered query features while adjusting style based on stylized keys and values from SD. MASA-SQ reduces cross-view inconsistency and computational overheads through adaptive fusion of style and sparsity-aware multi-view priors. Furthermore, we present the Wavelet Frequency Alignment Loss for stylized distribution alignments across frequency domains. To further accelerate style optimization, we leverage a 3D sparse-view strategy to select geometrically representative views through Maximin Distance Design. Extensive experiments demonstrate that Diff-StyGS outperforms state-of-the-art text/image-based 3DGS style transfer methods in terms of multi-view consistency, stylization quality, and content fidelity.
Primary Area: generative models
Submission Number: 8086
Loading