Keywords: Artificial Intelligence, Computer Vision, Novel View Synthesis, 3D Gaussian Splatting
TL;DR: We enhance sparse-view 3DGS generalization using flat-minima optimization via random position perturbations.
Abstract: Recent advances in neural rendering have established 3D Gaussian Splatting (3DGS) as a highly efficient representation for novel view synthesis, enabling real-time training and rendering with strong fidelity. However, when supervision is limited to a sparse set of input views, 3DGS tends to overfit to the observed images, resulting in poor generalization to unseen viewpoints. We approach this challenge from the perspective of flat minima (FM) optimization, which seeks solutions that remain stable under small parameter perturbations and are thus more robust. Viewing Gaussian parameters as trainable weights, we adapt FM principles to the geometric and dynamic nature of 3DGS by introducing several key techniques. First, we propose a Scale-Adaptive Perturbation (SAP) scheme that scales perturbation magnitude according to each Gaussian’s anisotropy, preserving fine details while promoting robustness. Second, we adopt stochastic perturbation where each Gaussian is probabilistically perturbed or left unchanged, allowing perturbations while preventing oversmoothing of scene details. Third, we schedule the perturbation magnitude to increase gradually during training, avoiding excessive noise before Gaussians capture stable structure. Finally, we incorporate periodic reinitialization of non-positional parameters such as scale, rotation, and opacity, and Spherical Harmonics (SH) coefficients. preventing degenerate cases like elongated Gaussians and maintaining well-conditioned primitives throughout optimization. Together, these techniques form a lightweight framework that integrates seamlessly into existing 3DGS pipelines without architectural changes. Experiments on LLFF and Mip-NeRF360 demonstrate that our method consistently improves both quantitative metrics and perceptual quality under sparse-view supervision, producing reconstructions that are sharper, more stable, and better generalized to novel viewpoints.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 24388
Loading