3D Swin Transformer for Partial Medical Auto Segmentation

15 Sept 2023 (modified: 22 Dec 2023)Submitted to FLARE 2023EveryoneRevisionsBibTeX
Keywords: auto segmentation, self-training, swin transformer
TL;DR: Application of convolutional (nnU-Net) and transformer (Swin-X Seg) for medical auto segmentation under partial labels
Abstract: Transformers are the highest accuracy segmentation frameworks in computer vision for natural imagery from the past few years. In contrast, medical imaging approaches, except a select few (for example, SwinUNETR and SMIT), are still dominated by the nnU-Net architecture family. In this paper, we investigate the application of a hierarchical vision transformer to the FLARE-23 challenge. Specifically, we benchmark our results using a relatively lightweight architecture, Swin-X Seg. We use multi-model self-training, wherein we use nnU-Net for predicting pseudo labels on partially labeled cases and then optimize the transformer architecture for memory requirements. Our network achieved the average DSC scores of 83.13 % and 35.19 % on the open validation set (50 cases) for organs and tumors, respectively, while staying under a max GPU memory utilization of 4GB at evaluation runtime. Our results show that there is potential for the transformer architecture to perform at par or better than conventional convolutional approaches, and we hope our findings encourage more research in the area.
Supplementary Material: zip
Submission Number: 35
Loading