Keywords: 3D medical image segmentation, nnUNet, memory-efficient, foundation models, self-supervised learning, multi-class segmentation
TL;DR: A class scalable architecture that decouples the number of classes and the memory requirement is proposed and demonstrated enabling the segmentation of up to a thousand classes in 3D.
Abstract: Medical image segmentation has transformed clinical routine by providing fast and accurate methods for the automated measurement of biomarkers and lesions. While foundation models promise broad generalization across hundreds of anatomical structures, they often under-perform compared to task-specific deep learning methods like nnUNet. However, these specialized models face scalability challenges when segmenting large numbers of classes in 3D images.
We introduce a class scalable 3D segmentation method combining a low rank basis and projection operator with a chunked cross entropy and dice loss. This design decouples the number of classes and the peak memory requirements enabling the segmentation of hundreds of classes in 3D. Integrated into the nnUNet framework, the proposed method supports state-of-the-art training and architectures.
Scalability of our framework was demonstrated by creating and obtaining high dice scores ($>0.95$) on a novel synthetic 3D “Toy Dataset” with up to 1000 different classes. Performance on the TotalSegmentator dataset (117 classes) was assessed showing comparable mean dice scores between the proposed method and the multi-model TotalSegmentator baseline ($0.913$ vs $0.928$) and outperforming VISTA3D ($0.803$).
These results highlight a practical path toward a unified, scalable foundation model for comprehensive 3D medical image segmentation of thousands of classes.
Primary Subject Area: Segmentation
Secondary Subject Area: Foundation Models
Registration Requirement: Yes
Visa & Travel: No
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 197
Loading