VoxelFeat: Voxel-wise foundation model features

Pascual Tejero Cervera; Samuel Joutard; Raphael Prevost; Maximilian Pietsch

VoxelFeat: Voxel-wise foundation model features

Pascual Tejero Cervera, Samuel Joutard, Raphael Prevost, Maximilian Pietsch

Published: 01 May 2025, Last Modified: 30 May 2025MIDL 2025 - Short PapersEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Medical Image Segmentation, Feature Upsampling, Interactive Segmentation

TL;DR: 3D, spatially consistent, denoised and upsampled representations of 2D foundation model vision features for medical imaging.

Abstract: Foundation models like SAM2 offer rich semantic features but suffer from fixed resolution, transformer artifacts, and inconsistent representations across views, limiting their direct use in 3D applications such as image segmentation. We extend FeatUp, a multi-view self-supervised upsampling approach, to 3D by introducing explicit 3D position encodings and through-plane augmentations. Our normalizer-free NFNet-based architecture enables consistent, denoised, and resolution-agnostic feature inference in medical CT volumes. The resulting 3D-aware representation supports interactive segmentation via point-wise, local inference at native resolution.

Submission Number: 113

Loading