Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding

Published: 21 May 2026, Last Modified: 21 May 2026CVPR 2026 Workshop OpenSUN3D PosterEveryoneRevisionsCC BY 4.0
Keywords: 3D Gaussian Splatting, Pretraining, Model Distillation, Scene Encoder
TL;DR: Chorus distills multiple 2D teachers into a native 3DGS encoder, learning transferable 3D scene features that hint at a shared pretraining recipe for 3DGS and point clouds.
Abstract: While 3DGS has emerged as a high-fidelity scene representation, encoding rich, general-purpose features directly from its primitives remains under-explored. We address this gap by introducing Chorus, a multi-teacher pretraining framework that learns a holistic feed-forward 3D Gaussian Splatting (3DGS) scene encoder by distilling complementary signals from 2D foundation models. Chorus employs a shared 3D encoder and teacher-specific projectors to learn from language-aligned, generalist, and object-aware teachers, encouraging a shared embedding space that captures signals from high-level semantics to fine-grained structure. We evaluate Chorus on a wide range of tasks: open-vocabulary semantic and instance segmentation, linear and decoder probing, data-efficient supervision, as well as LLM-based Q&A. Besides 3DGS, we also test Chorus on several benchmarks that only support point clouds by pretraining a variant using only Gaussians’ centers, colors, estimated normals. Interestingly, this encoder shows strong transfer and outperforms the point clouds baseline while using 39.9 times fewer training scenes. Finally, we propose a render-and-distill adaptation that facilitates out-of-domain finetuning.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 24
Loading