RHViT: A Robust Hierarchical Transformer for 3D Multimodal Brain Tumor Segmentation Using Biased Masked Image Modeling Pre-training

Published: 01 Jan 2023, Last Modified: 13 Nov 2024BIBM 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Accurate brain tumor segmentation in medical image analysis is crucial for diagnosis and treatment planning. While computer-aided methods have shown promise, several challenges persist. Most existing methods struggle with smaller tumors, treating all regions uniformly. Additionally, they lack robustness when dealing with data corruption and handling missing modalities, common in clinical settings. In this paper, we present a robust hierarchical vision transformer (RHViT) for 3D multimodal brain tumor segmentation, employing an encoder-decoder structure. Our approach combines 3D convolutions and self-attention, offering efficient and effective training. 3D convolutions help capture local information and generate hierarchical features, improving tumor segmentation accuracy. To enhance robustness, we pre-train the encoder using masked image modeling (MIM). This pre-training equips the model to handle data corruption, resulting in improved segmentation even in challenging scenarios. Furthermore, we introduce a novel biased masking strategy during MIM to focus the model's attention on tumor regions. This facilitates better tumor representations and effective fusion of multimodal features. Importantly, our biased masking technique strengthens the model's resilience when dealing with incomplete multimodal data during testing, making it a practical choice. Extensive experiments confirm the superiority of our model over existing approaches.
Loading