MedGMAE: Gaussian Masked Autoencoders for Medical Volumetric Representation Learning

ICLR 2026 Conference Submission5792 Authors

15 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Gaussian Representation, Medical Imaging analysis, Volumetric Representation Learning
TL;DR: MedGMAE replaces traditional voxel-level reconstruction in medical volumetric data pre-training with 3D Gaussian primitives prediction, enabling more effective anatomical continuous representation and faster CT reconstruction convergence.
Abstract: Self-supervised pre-training has emerged as a critical paradigm for learning transferable representations from unlabeled medical volumetric data. Masked autoencoder based methods have garnered significant attention, yet their application to volumetric medical image faces fundamental limitations from the discrete voxel-level reconstruction objective, which neglects comprehensive anatomical structure continuity. To address this challenge, We propose MedGMAE, a novel framework that replaces traditional voxel reconstruction with 3D Gaussian primitives reconstruction as new perspectives on representation learning. Our approach learns to predict complete sets of 3D Gaussian parameters as semantic abstractions to represent the entire 3D volume, from sparse visible image patches. MedGMAE demonstrates dual utility across medical imaging applications. For representation learning, sparse Gaussian prediction produces superior encoder representations that outperform traditional MAE baselines on downstream segmentation, classification, and registration tasks. For volumetric reconstruction, the Gaussian decoder leverages pretrained anatomical priors to accelerate 3D CT volume reconstruction convergence. Extensive experiments across multiple medical imaging datasets demonstrate that our approach achieves superior performance, establishing a new paradigm for medical image pre-training. Code will be released soon.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5792
Loading