Keywords: Novel View Synthesis, Dynamic Scene, Gaussian Splatting
Abstract: Dynamic rendering methods often prioritize photometric fidelity while lacking explicit semantic representations, which constrains their ability to perform semantically guided rendering. To this end, we introduce Language-Guided 4D Gaussian Splatting (L4DGS), a lightweight framework for real-time dynamic scene rendering that integrates natural language into semantically structured 4D Gaussian representations. Central to L4DGS is a Sparse Multi-Scale Attention (SMSA) mechanism that enables fine-grained, language-driven control by emphasizing semantically relevant regions across space and time. To enforce semantic fidelity and spatial coherence, we propose a static regularization that aligns language-guided features with rendered outputs and ensures consistent depth. To further ensure temporal consistency, A dynamic regularization penalizes abnormal variations in semantics and depth over consecutive unit time intervals. L4DGS achieves a 16.1% improvement in PSNR, reduces perceptual error by 58.8%, and increases rendering speed by over 50\%. Experimental results demonstrate the superiority of our approach in both visual quality and computational efficiency.
Supplementary Material: pdf
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 22021
Loading