Exploring Synergies Between Convolutional Neural Networks and Transformers for Breast Cancer Segmentation

Carlos Santiago, Jacinto C. Nascimento

Published: 2025, Last Modified: 28 Feb 2026Deep-Breath@MICCAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Breast cancer requires precise and early detection for effective treatment. Lesion segmentation is a pivotal stage to screen for the patient’s condition and often integrated in CAD systems designed to assist clinicians. However, such systems still struggle with high false positive rates. Conventional segmentation methods mainly rely on convolutional neural networks (CNNs), proficient in local patterns but lacking in distant correlations. Recent advancements in Vision Transformers (ViTs), have shown improved efficacy and accuracy, as they integrate attention mechanisms, emphasizing finer details and global context. This work proposes a new mammography image segmentation model that combines the benefits of CNNs and Transformers, through an encoder module with two branches. Their information is then fused with an attention mechanism to help identify relevant spatial and channel features. The proposed model achieves state-of-the-art results in both the CBIS-DDSM and INbreast benchmark datasets, demonstrating the advantage of merging these techniques into a single model.

External IDs:dblp:conf/miccai/SantiagoN25