Keywords: Vision-Language Model, Breast Mammography, Multi- Modal Dataset
TL;DR: Vision Language Model for Breast Mammography
Abstract: Deep learning methods have demonstrated promising outcomes in predicting BI-RADS scores from mammography images. However, the interpretability of these images can vary, leading to discrepancies even among radiologists. Given the inherent complexity of mammography images, training classification models solely based on image labels often yields subpar performance. To overcome this challenge, we curated 2313 mammogram images and their corresponding captions from two mammography atlases. Our proposed approach employs a multi-modal model that leverages a pretrained PubMedBERT for the language component. By training this model on image-text pairs using contrastive learning, we empower our vision encoder to assimilate the rich information embedded within the captions, thereby enhancing its comprehension of mammography findings. Subsequently, we fine-tune the vision encoder using two datasets for BI-RADS prediction, achieving superior performance compared to models trained without pretraining, particularly when labeled samples are scarce. The enhancement in the 3-class average F1 score varies, ranging from +1\% to +14\%, depending on the number of training samples. Specifically, a +1\% increase was noted when utilizing 40K training samples, while a +14\% increase was observed with 1K samples. Furthermore, our experimental findings reveal that 2K image-text pairs from mammography atlases can be more informative than 2K labeled samples even for the label prediction, where the average margin is +1.1\% when more than 10K training samples are present, which underscores the significance of incorporating textual information for modeling medical image data. As a result, our work provides a vision-language model for mammography and highlights the textual information from mammography atlases. The training code, pre-trained model weights, and data extraction scripts are publicly available at: https://github.com/igulluk/MAM-CLIP
Supplementary Material: zip
Submission Number: 14
Loading