Empirical Analysis of Scaling Vision Foundation Models for Chest X-rays

Published: 27 Mar 2025, Last Modified: 09 May 2025MIDL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision Foundation Models, Chest X-ray, self supervised learning
Abstract: Recent advancements in multimodal transformers have shown remarkable success in computer vision and natural language tasks, yet their adaptation to the clinical world remains challenging. We introduce CXformer, a vision transformer adapted for chest X-ray analysis, through systematic investigation of architectural choices and training modifications from DINOv2. Our empirical results show that using registers in ViT training, centering the teacher model's softmax outputs, and optimizing the number of heads leads to better performance. The small version of CXformer(S) (22M parameters) achieves 83.28% mean AUROC on CheXpert test set, surpassing the baseline of 80.46% achieved with vanilla DINOv2 settings. Contrary to common assumptions, our larger model CXformer(B) with 87M parameters shows similar performance at 84% mean AUROC on CheXpert, suggesting that training optimizations matter more than model size. Furthermore compared to the current state-of-the-art RAD-DINO, our CXformer(B), with 46% reduced pretraining compute (in FLOPs) achieves an average AUROC of 87.93% (vs 87.32% by RAD-DINO) on pathology image classification task evaluated across three widely used CXR datasets i.e. CheXpert, RSNA Pneumonia, and NIH CXR8. Beyond classification, CXformer also delivers competitive, and occasionally superior, performance in semantic segmentation and radiology report generation, underscoring its versatility. CXformer base and small models can be found at https://huggingface.co/m42-health
Primary Subject Area: Foundation Models
Secondary Subject Area: Unsupervised Learning and Representation Learning
Paper Type: Methodological Development
Registration Requirement: Yes
Reproducibility: https://github.com/m42-health/CXformer
Visa & Travel: Yes
Latex Code: zip
Copyright Form: pdf
Submission Number: 155
Loading