Keywords: Medical Image Segmentation, Feature Upsampling, Interactive Segmentation
TL;DR: 3D, spatially consistent, denoised and upsampled representations of 2D foundation model vision features for medical imaging.
Abstract: Foundation models like SAM2 offer rich semantic features but suffer from fixed resolution, transformer artifacts, and inconsistent representations across views, limiting their direct use in 3D applications such as image segmentation. We extend FeatUp, a multi-view self-supervised upsampling approach, to 3D by introducing explicit 3D position encodings and through-plane augmentations. Our normalizer-free NFNet-based architecture enables consistent, denoised, and resolution-agnostic feature inference in medical CT volumes. The resulting 3D-aware representation supports interactive segmentation via point-wise, local inference at native resolution.
Submission Number: 113
Loading