Empirical Analysis of Scaling Vision Foundation Models for Chest X-rays

Published: 27 Mar 2025, Last Modified: 01 May 2025MIDL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision Foundation Models, Chest X-ray, Multimodality, self supervised learning
Abstract: Recent advancements in multimodal transformers have shown remarkable success in computer vision and natural language tasks, yet their adaptation to the clinical world remains challenging. We introduce Scan42, a vision transformer adapted for chest X-ray analysis, through systematic investigation of architectural choices and training modifications from DINOv2. Our empirical results show that using registers in ViT training, centering the teacher model's softmax outputs, and optimizing the number of heads leads to better performance. The small version of Scan42(S) (22M parameters) achieves 83.28% mean AUROC on CheXpert test set, surpassing the baseline of 80.46\% achieved with vanilla DINOv2 settings. Contrary to common assumptions, our larger model Scan42(B) with 87M parameters shows similar performance at 84% mean AUROC on CheXpert, suggesting that training optimizations matter more than model size. Furthermore compared to the current state-of-the-art RAD-DINO, our Scan42(B), with 46% reduced pretraining compute (in FLOPs) achieves an average AUROC of 87.93% (vs 87.32% by RAD-DINO) on pathology image classification task evaluated across three widely used CXR datasets i.e. CheXpert, RSNA, and NIH. Beyond classification, Scan42 also delivers competitive, and occasionally superior, performance in semantic segmentation and radiology report generation, underscoring its versatility. By open-sourcing our model checkpoints, we aim to promote reproducibility, reduce resource barriers, and advance scalable solutions for medical imaging research.
Primary Subject Area: Foundation Models
Secondary Subject Area: Unsupervised Learning and Representation Learning
Paper Type: Methodological Development
Registration Requirement: Yes
Visa & Travel: Yes
Submission Number: 155
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview