Keywords: Sparse-view, G-3DGS, Conditional Diffusion, Novel View Synthesis
TL;DR: CoDiffSplat introduces single-step conditional diffusion guided by cross-view Entropy-Aware embeddings to refine Gaussian splatting, recovering geometric details and structures in occluded or ambiguous regions, outperforming previous methods.
Abstract: Generalizable 3D Gaussian Splatting (G-3DGS) has emerged as a promising approach for novel view synthesis under sparse-view settings.
However, existing frameworks remain restricted by pixel-aligned Gaussian estimation, which struggles in regions that are partially observed or occluded, often resulting in incomplete geometry and structural collapse.
To overcome these challenges, we propose CoDiffSplat, a new framework that couples semantic-conditioned latent diffusion with 3D Gaussian splatting.
Our design departs from conventional diffusion applied on image feature maps: instead, a lightweight single-step diffusion directly refines Gaussian parameters, ensuring efficiency while preserving geometric consistency.
In addition, we introduce a Cross-View Entropy-Aware (CEA) module that aggregates multi-view semantics and geometry into robust conditional embeddings, enabling diffusion to resolve ambiguities under occlusion and sparse overlap.
Comprehensive experiments on multiple benchmarks demonstrate that CoDiffSplat consistently improves geometric quality and structural completeness, especially under challenging extrapolation settings.
Our study establishes conditional diffusion as a scalable refinement mechanism for sparse-view 3D reconstruction, advancing the reliability of generalizable Gaussian splatting.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 7715
Loading