What Is the Best Way to Fine-Tune Self-supervised Medical Imaging Models?

Published: 01 Jan 2024, Last Modified: 20 Aug 2025MIUA (1) 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, self-supervised learning (SSL) has enabled significant breakthroughs via training large foundation models. These self-supervised pre-trained models are typically utilized for downstream tasks via end-to-end fine-tuning. However, it remains unclear whether end-to-end fine-tuning is truly optimal for effectively leveraging the pre-trained knowledge. This is especially true considering the diverse categories of SSL that capture distinct features, potentially requiring varied fine-tuning approaches. To bridge this research gap, we present the first comprehensive study discovering optimal fine-tuning strategies for self-supervised learning in medical imaging. Firstly, we develop strong contrastive and restorative SSL baselines that outperform SOTA methods on four diverse downstream tasks. Next, we conduct an extensive fine-tuning analysis across multiple pre-training and fine-tuning datasets, as well as various fine-tuning dataset sizes. Contrary to the conventional wisdom of fine-tuning only the last few layers of a pre-trained network, we show that fine-tuning intermediate layers is much more effective. Specifically, fine-tuning the second quarter (25–50%) of the network is optimal for contrastive SSL whereas fine-tuning the third quarter (50–75%) of the network is optimal for restorative SSL. Moreover, compared to the de-facto standard of end-to-end fine-tuning, our best fine-tuning strategy, which fine-tunes a shallower network consisting of the first three quarters (0–75%) of the pre-trained network, yields improvements of as much as 5.48%. Additionally, using these insights, we propose a simple yet effective method to leverage the complementary strengths of multiple SSL models, resulting in enhancements of up to 3.57%. Given the rapid progress in SSL, we hope these fine-tuning techniques will significantly improve the utility of self-supervised medical imaging models.
Loading