Abstract: Mammography is the most important diagnostic tool for early detection of breast cancer, which accounts for a significant proportion of cancer-related mortality in women. Computer-aided diagnosis systems, especially those empowered by deep neural networks, offer promising advancements in mammographic analysis. This paper evaluates four state-of-the-art Vision-Language Models (VLMs), namely CLIP, BiomedCLIP, PubMedCLIP, and ALIGN, on two essential tasks: Breast Density and BI-RADS Assessment. Leveraging VinDr-Mammo and EMBED datasets, our experiments investigate both zero-shot and fine-tuning approaches, as well as the impact of dataset distribution, data efficiency, and under-sampling techniques. Our findings indicate that fine-tuning pre-trained models on mammography-specific data significantly enhanced model performance with varying degrees of improvement across different tasks and models.
Loading