Vision’s Potential Unlocked: How Pretraining and Strategic Fine-tuning Improve Stroke Relapse Prediction

30 Nov 2025 (modified: 15 Dec 2025)MIDL 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Self-Supervised Image Pretraining, Modality Contribution, Multimodal Fusion, Stroke Relapse Prediction
TL;DR: Pretraining ViTs on 3D CTA images unlocks the full potential of multimodal data for stroke relapse detection.
Abstract: Unbalanced modality usage, especially modality collapse, remains a major limitation in multimodal learning, often preventing models from exploiting the full potential of multimodal datasets. Pretraining multimodal neural networks has been shown to enhance overall performance. Nevertheless, its effect on modality contribution remains largely unexplored. Moreover, freezing parts of the network during fine-tuning is crucial to mitigate catastrophic forgetting. Yet the impact of freezing strategies on modality contribution has also received little attention. In this work, we explore how self-supervised image pretraining can mitigate modality contribution imbalance and enhance cross-modal integration for stroke relapse detection---a clinically critical task we recently addressed. To this end, two multimodal neural networks were pretrained in a self-supervised manner and subsequently fine-tuned under two distinct freezing strategies. Their performance was compared against both the baseline model from our previous work and models trained entirely from scratch in this work. Our results demonstrate that pretraining enables a more comprehensive exploitation of the multimodal image-tabular dataset, outperforming both the prior baseline and all non-pretrained models. Furthermore, pretraining notably increased the vision’s modality contribution, while freezing strategies were found to significantly affect modality utilization as well. The overall best-performing model, based on a Vision Transformer, successfully overcame unimodal collapse through self-supervised pretraining. These findings indicate that pretraining combined with strategic fine-tuning allows full use of multimodal medical datasets, supporting more balanced and effective models for tasks such as stroke relapse detection. The code for the pretraining step is publicly available at https://github.com/ChristianGappGit/SSL_Pretraining.
Primary Subject Area: Application: Neuroimaging
Secondary Subject Area: Interpretability and Explainable AI
Registration Requirement: Yes
Reproducibility: https://github.com/ChristianGappGit/SSL_Pretraining
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 180
Loading