Language-Guided 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

Language-Guided 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering

ICLR 2026 Conference Submission22021 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Novel View Synthesis, Dynamic Scene, Gaussian Splatting

Abstract: Dynamic rendering methods often prioritize photometric fidelity while lacking explicit semantic representations, which constrains their ability to perform semantically guided rendering. To this end, we introduce Language-Guided 4D Gaussian Splatting (L4DGS), a lightweight framework for real-time dynamic scene rendering that integrates natural language into semantically structured 4D Gaussian representations. Central to L4DGS is a Sparse Multi-Scale Attention (SMSA) mechanism that enables fine-grained, language-driven control by emphasizing semantically relevant regions across space and time. To enforce semantic fidelity and spatial coherence, we propose a static regularization that aligns language-guided features with rendered outputs and ensures consistent depth. To further ensure temporal consistency, A dynamic regularization penalizes abnormal variations in semantics and depth over consecutive unit time intervals. L4DGS achieves a 16.1% improvement in PSNR, reduces perceptual error by 58.8%, and increases rendering speed by over 50\%. Experimental results demonstrate the superiority of our approach in both visual quality and computational efficiency.

Supplementary Material: pdf

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 22021

Loading