Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models

Ulzee An; Moonseong Jeong; Simon Austin Lee; Aditya Gorla; Yuzhe Yang; Sriram Sankararaman

Raptor: Scalable Train-Free Embeddings for 3D Medical Volumes Leveraging Pretrained 2D Foundation Models

Ulzee An, Moonseong Jeong, Simon Austin Lee, Aditya Gorla, Yuzhe Yang, Sriram Sankararaman

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Raptor (Random Planar Tensor Reduction) leverages a pretrained image model (DINOv2-L) to create and compress visual tokens in 3D, and achieves state-of-the-art performance across 10 medical volume tasks without the need for costly training.

Abstract: Current challenges in developing foundational models for volumetric imaging data, such as magnetic resonance imaging (MRI), stem from the computational complexity of state-of-the-art architectures in high dimensions and curating sufficiently large datasets of volumes. To address these challenges, we introduce Raptor (Random Planar Tensor Reduction), a train-free method for generating semantically rich embeddings for volumetric data. Raptor leverages a frozen 2D foundation model, pretrained on natural images, to extract visual tokens from individual cross-sections of medical volumes. These tokens are then spatially compressed using random projections, significantly reducing computational complexity while retaining rich semantic information. Extensive experiments on 10 diverse medical volume tasks verify the superior performance of Raptor over state-of-the-art methods, including those pretrained exclusively on medical volumes (+3 SuPreM, +6 MISFM, +10 Merlin, +13 VoCo, and +14 SLIViT), while entirely bypassing the need for costly training. Our results highlight Raptor's effectiveness and versatility as a foundation for advancing deep learning-based methods for medical volumes (code: github.com/sriramlab/raptor).

Lay Summary: Raptor is a new method for analyzing 3D medical scans like MRIs and CTs—without any training. Instead of using complex 3D models, Raptor slices the scans into 2D images and processes them using a powerful, off-the-shelf 2D vision model. It then compresses the information using random projections to create compact, meaningful representations of the original 3D data. This approach is fast, requires no labeled data or training, and outperforms existing methods on a wide range of medical tasks—all while using far less computation.

Link To Code: https://github.com/sriramlab/raptor

Primary Area: Applications->Health / Medicine

Keywords: Embedding, Volumes, Foundation model, Random projection, Compression, MRI, CT

Submission Number: 15029

Loading