Contrast-agnostic Spinal Cord Segmentation: A Comparative Study of ConvNets and Vision Transformers

Published: 27 Apr 2024, Last Modified: 27 Apr 2024MIDL 2024 Short PapersEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Spinal Cord, MRI, Contrasts, Segmentation, Deep Learning, CNNs, Vision Transformers
Abstract: The cross-sectional area (CSA) of the spinal cord (SC) computed from its segmentation is a relevant clinical biomarker for the diagnosis and monitoring of cord compression and atrophy. One key limitation of existing automatic methods is that their SC segmentations depend on the MRI contrast, resulting in different CSA across contrasts. Furthermore, these methods rely on CNNs, leaving a gap in the literature for exploring the performance of modern deep learning (DL) architectures. In this study, we extend our recent work \cite{Bdard2023TowardsCS} by evaluating the contrast-agnostic SC segmentation capabilities of different classes of DL architectures, namely, ConvNeXt, vision transformers (ViTs), and hierarchical ViTs. We compared 7 different DL models using the open-source \textit{Spine Generic} Database of healthy participants $(\text{n}=243)$ consisting of 6 MRI contrasts per participant. Given a fixed dataset size, our results show that CNNs produce robust SC segmentations across contrasts, followed by ConvNeXt, and hierarchical ViTs. This suggests that: (i) inductive biases such as learning hierarchical feature reprensentations via pooling (common in CNNs) are crucial for good performance on SC segmentation, and (ii) hierarchical ViTs that incorporate several CNN-based priors can perform similarly to pure CNN-based models.
Submission Number: 151
Loading