SurstSplat: Dynamic Surgical Gaussian Reconstruction with Spatiotemporal Graph Matching

01 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Dynamic 3D Reconstruction, Surgical Vision, Multimodal Foundation Models
Abstract: Reconstructing dynamic 3D models from surgical videos is crucial for advanced medical applications, but faces challenges from limited textures, inconsistent lighting, and complex tissue deformations. We present \method, a framework that enhances dynamic Gaussian reconstruction through spatiotemporal semantic graph matching. By integrating multimodal features from pre-trained 2D foundation models into 3D Gaussian representations, our approach effectively captures tissue deformations and tool interactions. The spatiotemporal graph matching mechanism improves handling of deformable tissues over standard Gaussian methods while enabling real-time semantic segmentation, language-guided editing, and medical visual question answering. Experiments demonstrate that \method~enhances rendering quality in challenging surgical conditions and allows clinical 3D models to leverage pre-trained 2D multimodal foundation models. Our approach improves both rendering quality and computational efficiency, supporting advanced intraoperative applications and advancing robot-assisted surgery.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 406
Loading