A data-scalable transformer for medical image segmentation: architecture, model efficiency, and benchmark
Abstract: Transformer, as a new generation of neural architecture, has demonstrated remarkable performance in natural language processing and computer vision. However, existing vision Transformers struggle to learn with limited medical data and are unable to generalize on diverse medical image tasks. To tackle these challenges, we present MedFormer as a data-scalable Transformer towards generalizable medical image segmentation. The key designs incorporate desirable inductive bias, hierarchical modeling with linear-complexity attention, and multi-scale feature fusion in a spatially and semantically global manner. MedFormer can learn across tiny- to large-scale data without pre-training. Extensive experiments demonstrate the potential of MedFormer as a general segmentation backbone, outperforming CNNs and vision Transformers on three public datasets with multiple modalities (e.g., CT and MRI) and diverse medical targets (e.g., healthy organ, diseased tissue, and tumor). We make the models and evaluation pipeline publicly available, offering solid baselines and unbiased comparisons for promoting a wide range of downstream clinical applications.
0 Replies
Loading