Learning General-purpose Biomedical Volume Representations using Randomized Synthesis

Neel Dey; Benjamin Billot; Hallee E. Wong; Clinton Wang; Mengwei Ren; Ellen Grant; Adrian V Dalca; Polina Golland

Learning General-purpose Biomedical Volume Representations using Randomized Synthesis

Neel Dey, Benjamin Billot, Hallee E. Wong, Clinton Wang, Mengwei Ren, Ellen Grant, Adrian V Dalca, Polina Golland

Published: 22 Jan 2025, Last Modified: 17 May 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: synthetic data, representation learning, medical image analysis, image registration, image segmentation

TL;DR: We develop a synthetic data-driven contrastive learning framework to train a general-purpose 3D biomedical vision model that achieves state-of-the-art results in 3D multimodal image registration and few-shot segmentation.

Abstract: Current volumetric biomedical foundation models struggle to generalize as public 3D datasets are small and do not cover the broad diversity of medical procedures, conditions, anatomical regions, and imaging protocols. We address this by creating a representation learning method that instead anticipates strong domain shifts at training time itself. We first propose a data engine that synthesizes highly variable training samples that would enable generalization to new biomedical contexts. To then train a single 3D network for any voxel-level task, we develop a contrastive learning method that pretrains the network to be stable against nuisance imaging variation simulated by the data engine, a key inductive bias for generalization. This network's features can be used as robust representations of input images for downstream tasks and its weights provide a strong, _dataset-agnostic_ initialization for finetuning on new datasets. As a result, we set new standards across _both_ multimodality registration and few-shot segmentation, a first for any 3D biomedical vision model, all without (pre-)training on any existing dataset of real images.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5254

Loading